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FOREWORD 


Among  the  responsibilities  assigned  to  the  Office  of  the  Manager,  National 
Communications  System,  is  the  management  of  the  Federal  Telecommunication 
Standards  Program.  Under  this  program,  the  NCS,  with  the  assistance  of  the 
Federal  Telecommunication  Standards  Committee  identifies,  develops,  and 
coordinates  proposed  Federal  Standards  which  either  contribute  to  the 
interoperability  of  functionally  similar  Federal  telecommunication  systems  or 
to  the  achievement  of  a  compatible  and  efficient  interface  between  computer 
and  telecommunication  systems.  In  developing  and  coordinating  these  standards 
a  considerable  amount  of  effort  is  expended  in  initiating  and  pursuing  joint 
standards  development  efforts  with  appropriate  technical  committees  of  the 
Electronic  Industries  Association,  the  American  National  Standards  institute, 
the  International  Organization  for  Standardization,  and  the  International 
Telegraph  and  Telephone  Consultative  Committee  of  the  International 
Telecommunication  Union.  This  Technical  Information  Bulletin  presents  an 
overview  of  an  effort  which  is  contributing  to  the  development  of  compatible 
Federal,  national,  and  international  standards  in  the  area  of  digital 
facsimile  standards.  It  has  been  prepared  to  inform  interested  Federal 
activities  of  the  progress  of  these  efforts.  Any  comments,  inputs  or 
statements  of  requirements  which  could  assist  in  the  advancement  of  this  work 
are  welcome  and  should  be  addressed  to: 


Office  of  the  Manager 
National  Communications  System 
ATTN:  NCS-TS 
Washington,  D.C.  20305 
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1.  INTRODUCTION 

This  report  summarizes  the  results  of  work  performed 
under  contract  DCA1Q0-8 l-C-0024  with  the  Defense  Communications 
Agency.  Consideration  is  now  being  given  to  possible  CCITT 
standards  for  Group  4  Facsimile  which  refers  to  the  trans¬ 
mission  of  an  A4  sized  page  over  data  networks  containing 
error  control.  It  is  likely  that  the  basic  coding  technique 
for  Group  4  transmissions  will  be  some  advanced  form  of  the 
Modified  READ  code, which  is  the  optional  compression  algorithm 
for  Group  3.  The  purpose  of  this  study  is  to  investigate 
the  more  advanced  mixed  mode  coding  techniques  for  possible 
consideration  as  an  optional  algorithm  for  Group  4.  In  a 
mixed  mode  system  the  information  printed  on  a  page  is 
divided  into  two  parts  -  symbols  (letters,  numerals, 
punctuation,  etc.)  and  graphics  (logos,  signatures,  sketches, 
etc.)  The  purpose  of  this  study  was  to  examine  possible 
techniques  for  segmenting  a  document  into  graphic  and  symbol 
areas,  and  assemble  a  code  that  represents  the  entire  document. 

Parameters  to  be  considered  include  compression,  commonality 

★ 

with  facsimile  and  TELETEX  transmissions,  and  complexity 
of  implementation. 

Six  segmentation  technique  were  selected  for  analysis. 

The  techniques  were  designed  to  differ  from  each  other  as 

* TELETEX  refers  to  a  CCITT  recommendation  which  is  now 
under  development  for  communication  between  word  processors. 
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much  as  possible,  so  as  to  display  a  wide  variety  of  char¬ 
acteristics.  For  each  technique,  many  minor  modifications 
would  be  possible,  but  it  is  not  expected  that  these  modifi¬ 
cations  would  alter  the  conclusions  drawn  from  the  study. 

The  segmentation  techniques  analyzed  are: 

-  SYMBOL  REMOVAL/SCAN  LINE 

-  SYMBOL  REMOVAL/FULL  DOCUMENT 

-  SYMBOL  REMOVAL/LINE  OF  SYMBOLS 

-  EXTENDED  TELETEX 

-  PARTIAL  LINE  OF  SYMBOLS 

-  FULL  LINE  OF  SYMBOLS 

Section  2  presents  the  methodology  used  for  measuring 
compression.  Section  3  presents  descriptions  of  the  six 
mixed-mode  segmentation  alternatives  considered,  together  with 
the  calculations  resulting  in  compression  estimates.  Section  4 
discusses  the  commonality  of  each  alternative  with  Group  3 
facsimile.  Group  4  facsimile,  and  with  TELETEX.  Section  5 
discusses  the  complexity  of  implementation  of  each  technique. 
Section  6  compares  the  alternatives  and  draws  conclusions, 
while  recommendations  are  made  in  Section  7. 
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2.0  METHODOLOGY  FOR  MEASURING  COMPRESSION 


For  each  of  the  six  proposed  mixed  mode  techniques,  an 
estimate  of  the  compression  has  been  made.  Where  applicable, 
an  additional  estimate  was  made  of  the  compression  using  a 
Carriage  Return/Line  Feed  (CR/LF)  symbol  to  terminate  a  line 
of  symbols.  Estimates  of  compression  were  made  for  CCITT 
test  Documents  1  and  5.  Document  5  was  slightly  modified  by 
removing  the  vertical  line  in  the  center  of  the  page.  If  this 
were  not  done,  the  technique  FULL  LINE  OF  SYMBOLS  would  not 
be  able  to  encode  any  symbols. 

It  should  be  emphasized  that  the  compression  values 
calculated  in  this  report  are  estimates  only,  and  should 
not  be  regarded  as  actual  measured  numbers.  However,  it  is 
expected  that  the  relative  compressions  of  the  various 
segmentation  techniques  are  accurate,  since  the  same  assumptions 
and  estimates  were  used  for  all  of  them. 

2.1  ASSUMPTIONS 

In  making  compression  estimates  the  following  assumptions 
were  made: 

(1)  Each  symbol  is  encoded  using  8  bits,  which  allows 
up  to  256  different  symbols. 

(2)  Several  of  the  256  symbol  codes  can  be  made  available 
for  indicating  the  termination  of  symbol  transmission, 

or  other  requirements  of  the  segmentation  technique 
employed. 

(3)  A  stored  library,  suitable  for  the  document  being 
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transmitted,  is  available  at  both  sending  and 
receiving  terminals. 

(4)  Bits  required  to  identify  the  proper  symbol 
library  to  the  receiving  terminal  are  neglected. 

(5)  The  stored  library  will  accommodate  either  fixed 
or  propor tionaly  spaced  fonts,  including  several 
widths  for  word  spaces. 

(6)  All  characters  of  the  principal  font  used  in  the 
document  are  in  fact  recognized  as  such,  and  will 
be  encoded  as  symbols,  subject  to  the  rules  of  the 
proposed  technique. 

(7)  Lines  of  symbols  can  be  accumulated  despite  slight 
skews  of  the  printed  lines. 

(8)  The  characters  of  the  principal  font  include  math 
symbols,  italics,  and  Greek  letters,  but  not  sub¬ 
scripts  or  superscripts,  or  long  horizontal  or 
vertical  lines. 

(9)  Graphic  data  is  transmitted  using  the  modified 
READ  code,  without  EOL's. 

(10)  The  number  of  bits  required  to  transmit  increased 
width  of  white  spaces  by  means  of  Modified  READ  can 

be  neglected.  This  is  because  the  spacing  between  groups 
of  black  pels  (such  as  symbols)  usually  only  has  to 
be  specified  once,  and  the  READ  code  length  does 
not  grow  rapidly  with  the  length  of  a  white  run. 

(11)  Each  document  consists  of  2,376  rows  with  1,728  pels 
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per  row  (7.7  pels  per  mm,  or  approximately  196  pels 
per  inch) .  See  Appendix  A. 

(12)  Code  transmissions  will  not  experience  any 

transmission  errors,  so  addition  of  redundancy  for 
error  control  is  not  required. 


2.2  PRELIMINARY  CALCULATIONS 

2.2.1  BITS  PER  SYMBOL  USING  MODIFIED  READ  CODE 

It  is  assumed  that  there  is  an  average  number  of 
bits  required  to  transmit  a  text  symbol  by  means  of  the 
Modified  READ  code.  In  order  to  obtain  this  average,  CCITT 
Document  4  (see  figure  2-1)  was  used,  which  is  almost  100% 
text.  Document  4  was  found  to  have  a  total  of  4,001  symbols. 
Appendix  A  indicates  that  it  took  585,074  bits  to  transmit 
this  document  using  the  Modified  READ  code,  with  K  =  oo.  in¬ 
cluded  in  this  total  are  2,376  EOL's  at  12  bits  each  or  28,512 
bits,  and  about  454  blank  scan  lines  (not  including  those 
between  adjacent  lines  of  text) ,  each  of  which  takes  one  bit 
to  code,  or  454  bits.  Subtracting  both  of  these  from  the 

total,  it  took  556,108  bits  to  transmit  the  text,  or  556,108 

4001 

*  138.992  bits  per  symbol. 

2.2.2  GRAPHIC  BITS  PER  DOCUMENT 
2. 2. 2.1  DOCUMENT  1 

Now  consider  Document  1,  shown  in  Figure  2-2. 
Appendix  A  indicates  that  it  took  175,704  bits  to  transmit  with 
Modified  READ  (K  =  <*»),  which  less  28,512  bits  for  EOL's, 
gives  147,192  bits  for  the  information  on  the  page.  It  is 
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L'ordrede  luctramt  at  de  realisation  dss  applications  fait  l'objat  da  d teutons  au  plus  i.aut 
nlvaau  ds  la  Dirac  tioo  Gdndrals  das  T  Sitcom  muni  cations,  a  n'sst  cartas  pas  question  da 
construira  ca  systems  inttgrt  "an  bloc"  mats  bian  au  contraire  da  proeddar  par  Stapes,  par 
palls ra  succassifs.  Certains s  applications,  dont  la  rantabilitt  na  pourra  stra  assures,  m 
serent  pas  antraprises.  Actus  11s mant,  sur  tranta  applications  qui  ont  pu  stra  globalement 
defiaiaa,  six  an  sent  au  stada  da  l'axploitation,  six  autres  sa  sent  vu  donnar  la  priorlte  pour 
laur  realisation. 

Chaqua  application  ast  confide  4  un  "chef  da  pro  jet",  raspon  sable  successive  mant  da  sa 
conception,  da  son  analyse- pro  gr  animation  at  da  sa  miss  an  oeuvre  dans  une  region-pilots. 
La  generalisation  ultdrieure  da  1' application  realises  dans  cette  region- pilots  depend  das 
rdsultats  obtemis  at  fait  l'objat  d'une  decision  da  la  Direction  Generals.  Ndanmoins.  la 
chef  da  pro  jet  doit  dds  la  depart  considdrer  qua  son  activitd  a  une  vocation  nationals  done 
refuser  tout  particularisms  regional.  H  ast  aide  d'une  equips  d'analystes-programmeurs 
at  antourd  d’un  "groups  da  conception"  charge  da  rddiger  la  document  da  "definition  das 
objectifs  globaux"  puis  la  "cahiar  d as  charges"  da  1' application,  qui  sont  adressds  pour  avis 
4  tous  las  services  utilisataurs  potentials  at  aux  chefs  da  projat  das  autres  applications. 
La  groups  da  conception  comprend  6  4  10  personnes  re  present  ant  las  services  las  plus 
divers  concemds  par  la  projet, et  comports  obligatoiramant  un  bon  analysts  attache  4  1' ap¬ 
plication. 

n  -  L'IMPLANTATION  GEOGRAPHIQUE  D'UN  RESEA U  INFORMAT1QUE  PERFORMANT 

L 'organisation  da  l'antra prise  frtnqaise  das  telecommunications  repose  sur  1 'existence  da 
20  regions.  Das  ealeulateurs  ont  dte  implantda  dans  le  passe  au  moins  dans  toutes  les  plus 
importantss.  Ontrouva  ainsi  das  machines  Bull  Gamma  30  4  Lyon  et  Marseille,  des  GE  425 
4  Lille,  Bordeaux,  Toulouse  at  Montpellier,  un  GE  43?  4  Massy,  enfln  quelques  machines 
Bull  300  TI  4  programmes  c&blds  d  talent  r  deem  mant  ou  sont  encore  en  service  dans  les 
regions  da  Nancy,  Nantes.  Limoges,  Poitiers  at  Rouen  ;  ce  pare  est  essentlellement  utilise 
poor  la  comptabilitd  tdldphanique. 

Al'avsnir,  si  la  plupart  das  fichiers  necaasairas  aux  applications  decrites  plus  haut  pauvent 
Stra  gSrSsen tamps  differs,  un  certain  nombre  d'entre  eux  devront  necassairemant  Stre  ac- 
cessibles,  voire  mis  4  Jour  an  tamps  real  :  parmi  cas  derniers  le  fichier  commercial  des 
abannea,  la  fichier  das  reoseignements,  la  fichier  das  circuits,  le  fichier  technique  des 
abounds  contiandront  das  qusntitds  considerables  d'informations. 

La  volnma  total  da  caractti^ts  4  gdrar  en  phase  finale  sur  un  ordinateur  ayant  en  charge 
quelques  500  000  abounds  a  dte  estlmd  4  un  milliard  de  caractdres  au  moins.  Au  moins  le 
tiers  das  donndas  aeront  concern  des  par  das  traitements  en  temps  rdel. 

Aucun  das  ealeulateurs  enumdrds  plus  haut  na  permettait  d'envisager  de  tels  traitements. 

L 'integration  progressiva  da  toutes  les  applications  suppose  la  creation  d  'un  support  commun 
pour  toutes  las  informations,  une  veritable  "Banque  de  donndes".  rdpartie  sur  des  moyens 
da  traitement  nationaux  at  rdgionaux,  et  qui  davra  rester  aliments* ,  misa  4  jour  en  perma¬ 
nence,  4  partlr  da  la  base  da  1'entreprise,  c'est-4-dire  les  chantiers,  les  magasins.  les 
gui chats  das  services  d'abonnement,  las  services  de  personnel  etc. 

L' etude  das  diffdrents  fichiers  4  constituer  a  done  permis  de  ddfinir  les  principales  carac- 
tdristiques  du  rdsaau  d'ordinateurs  nouveaux  4  mettre  en  place  pour  aborder  la  realisation 
du  systems  informatif.  L'obligation  de  fairs  appel  4  des  ordinateurs  de  troisidme  generation, 
trds puiasantset dotes devolumineuses  mSmoires  de  masse,  a  conduit  4  en  rdduire  substan- 
tiellement  le  nombre. 

L’implantation  de  sept  centres  de  calcul  interregionaux  constituera  un  compromis  entre  : 
d'une  part  le  desir  de  reduire  le  coOt  Sconomique  da  l'ensemble,  de  faciliter  la  coordination 
das  Squipes  d'informaticiens;  et  d'autre  part  le  refus  da  crOer  des  centres  trop  important! 
difflciles  4  gSrer  et  4  diriger.et  posant  des  problems*  delieats  de  sScuritc.  Le  regroupe- 
mant  das  traitements  relatifs  4  plusiaurs  regions  sur  chacun  de  ces  sept  centres  permettra 
da  laur  donnar  une  taille  relativement  homogene.  Cheque  centre  "gt-rera"  environ  un  mil¬ 
lion  d'abonnds  4  la  fin  du  VI4me  Flan. 

La  misa  en  place  da  cas  centres  a  debuts  au  debut  da  l'annde  19?  1  .  un  ordinateur  IRIS  r>0  de 
la  Compagnie  Internationals  pour  l'lnformatiqua  a  ete  installs  4  Toulouse  an  fSvrier  ;  la 
mime  machine  vient  d'etre  misa  an  service  au  centra  da  calcul  interregional  da  Bordeaux. 


Figure  2-1  CCITT  Document  4 

Photo  n*  1  -  Document  trfes  dense  lettre  1,5mm  de  haut  - 
2-4  Restitution  photo  n*  9 


THE  SLEREXE  COMPANY  LIMITED 

SAPORS  LANE  -  BOOLE  •  DORSET  ■  BH2S  «ER 
(945 13)  51617  .  TUX  123456 


Our  Ref.  350/PJC/EAC 


18th  January,  1972. 


Dr.  P.N.  Cundall, 
Mining  Surveys  Ltd. , 
Holroyd  Road, 
Reading, 

Berks. 


Dear  Pete, 

Permit  me  to  introduce  you  to  the  facility  of  facsimile 
transmission. 

In  facsimile  a  photocell  is  caused  to  perform  a  raster  scan  over 
the  subject  copy.  The  variations  of  print  density  on  the  document 
cause  the  photocell  to  generate  an  analogous  electrical  video  signal. 
This  signal  is  used  to  modulate  a  carrier,  which  is  transmitted  to  a 
remote  destitution  over  a  radio  or  cable  communications  link. 

At  the  remote  terminal,  demodulation  reconstructs  the  video 
signal,  which  is  used  to  modulate  the  density  of  print  produced  by  a 
printing  device.  This  device  is  scanning  in  a  raster  scan  synchronised 
with  chat  at  the  transmitting  terminal.  As  a  result,  a  facsimile 
copy  of  the  subject  document  is  produced. 

Probably  you  have  uses  for  this  facility  in  your  organisation. 

Tours  sincerely, 

m. 

P.J.  CROSS 

Group  Leader  -  Facsimile  Research 


Figure  2-2  CC1TT  Document  1 
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assumed  that  the  Combined  Symbol  Matching  (CSM)  Technique 
(Appendix  B),  which  uses  a  transmitted  library,  encodes  all 
characters  on  the  page,  with  the  exception  of  the  "SLEREXE" 
line,  whose  characters  are  too  large  for  the  symbol  block 
size.  By  actual  count,  the  number  of  symbols  that  would  be 
coded  is  937,  not  including  spaces.  (In  fact,  Appendix  B 
shows  that  988  symbols  were  recognized.)  Using  Modified  READ, 
this  should  take  937  x  138.922  =  130,236  bits.  Therefore  the 
number  of  bits  required  to  transmit  the  graphics  on  Document 
1  is  147,192  -  130,236  =  16,956  bits.  This  compares  with  the 
18,549  bits  used  to  transmit  Residue  Code  given  by  Appendix 
B.  An  alternate  calculation  of  the  average  bits  per  symbol 
can  be  obtained  by  subtracting  the  18,549  residue  bits  of 
Appendix  B  from  the  147,192  bits  to  transmit  the  whole 
document,  giving  128,643  bits  to  transmit  the  text  symbols. 
Dividing  this  by  the  937  symbols  gives  an  average  of  137.29  bits 
per  symbol,  which  compares  favorably  with  138.992  obtained  in 
Section  2.2.1. 

In  a  stored  library  approach,  it  is  assumed  that  the 
letterhead  lines,  except  the  "SAPORS  LANE"  line,  would  not 
be  encoded  because  the  character  sizes  differ  greatly  from  the 
characters  in  the  body  of  the  text.  This  means  that  107 
symbols  out  of  the  937  would  not  be  encoded,  which  would  take 
107  x  138.992  *  14,872  bits  to  transmit  by  Modified  READ. 
Therefore  the  total  residue  that  would  not  be  symbol  encoded 
would  be  16,956  +  14,872  =  31,828  bits  for  Document  1. 
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L'ordrede  lanctment  et  dc  realisation  dta  applications  fait  1'objet  de  decisions  au  plus  1  aut 
niveau  de  la  Direction  Generals  des  Telecommunications.  II  n'est  certes  pas  question  de 
construire  ce  systems  integre  "en  bloc"  mais  bien  au  contraire  de  proceder  par  etapes.  par 
paliers  success! la.  Certaines  applications,  dont  la  rentabilite  ne  pourra  etre  assures,  i» 
seront  pas  entrepriaes.  Actuellement.  sur  trente  applications  qui  ont  pu  etre  globalement 
definies.  suen  sont  au  stade  de  l'exploitation,  six  autres  se  sont  vu  donner  la  priorite  pour 
leur  realisation. 

Chaque  application  est  confiee  a  un  "chef  de  projet",  responsable  successivement  de  sa 
conception,  de  son  analyse- programmation  et  de  sa  mise  en  oeuvre  dans  une  region -pilote. 
La  generalisation  ulterieure  de  l'application  realises  dans  cette  region-pilote  depend  des 
restiltats  obtenus  et  fait  1'objet  d'une  decision  de  la  Direction  GenSrale.  NSanmoins.  le 
chef  de  projet  doit  des  le  depart  considerer  que  son  activite  a  une  vocation  nationale  done 
refuser  tout  particularisms  regional.  D  est  aide  d'une  equipe  d'analystes-programmeurs 
et  entoure  d'un  "groupe  de  conception"  charge  de  rediger  le  document  de  "definition  des 
objectifs  globaux"  puis  le  "cahier  des  charges"  de  l'application.  qui  sont  adresses  pour  avis 
a  tous  les  services  utilisateurs  potentiels  et  aux  chefs  de  projet  des  autres  applications. 
Le  groupe  de  conception  comprend  E  i  10  personnes  reprSsentant  les  services  les  plus 
divers  concernds  par  le  projet, et  comporte  obligatoiremeni  un  bon  analyste  attache  a  Im¬ 
plication. 

II  -  L’IMPLANTATION  GEOGRAPHIQUE  D'UN  RESEA U  INFORMATtQUE  PERFORMANT 

L' organisation  de  l'entreprise  franqaise  des  telecommunications  repose  sur  l'existence  de 
20  regions.  Des  calculateurs  ont  ete  implantes  dans  le  pcsse  au  moins  dans  toutes  les  plus 
importantes.  Ontrouve  ainsi  des  machines  Bull  Gamma  30  a  Lyon  et  Marseille,  des  GE  425 
a  Lille,  Bordeaux,  Toulouse  et  Montpellier,  un  GE  437  a  Massy,  enfin  quelques  machines 
Bull  300  TI  a  programmes  cables  etaient  recemment  ou  sont  encore  en  service  dans  les 
regions  de  Nancy,  Nantes.  Limoges.  Poitiers  et  Rouen  ce  pare  est  essentiellement  utilise 
pour  la  comptabilite  teiephonique . 

Al'avenir,  sila  plupart  des  fichiers  necessaires  aux  applications  decrites  plus  haut  peuvent 
etre  geresen  temps  differe.  un  certain  nombre  d'entre  eux  devront  necessairement  etre  ac- 
cesslbles,  voire  mis  a  jour  en  temps  reel  parmi  ces  dermers  le  fichier  commercial  des 
abonnes,  le  fichier  des  reirseignements.  le  fichier  des  circuits  le  fichier  technique  des 
abonnes  contiendront  des  quantites  considerables  d'iniormatvons 

Le  volume  total  de  caracter'es  a  gerer  en  phase  finaie  sur  ur.  ordmateur  ayant  en  charge 
quelquea  S00  000  abonnes  a  ete  estime  a  un  milliard  de  caracteres  au  moins.  Au  moins  le 
tiers  des  donndes  seront  concernees  par  des  traitements  en  temps  reel. 

Aucun  des  calculateurs  enumeres  plus  haut  ne  permettait  d'envisager  de  tels  traitements. 
L’integration progressive  de  toutes  les  applications  suppose  la  creation  d’un  support  commun 
pour  toutes  les  informations,  une  veritable  "Banque  de  donnCes"  repartie  sur  des  moyens 
detraitement  nationaux  et  regionaux.  et  qui  devra  rester  alimentOe  mise  a  jour  en  perma¬ 
nence,  ft  partir  de  la  base  de  l'entreprise.  c'est-.i-dirc  les  chantiers.  les  magasins.  les 
guichets  des  services  d'abonnement,  les  services  de  oersonne)  etc 

L'etude  des  diffirents  fichiers  a  constituer  a  done  perms  de  dcfuur  les  pnncipales  carac- 
tiristiques  du  r^seau  d'ordinateur s  nouveaux  o  mettre  en  nlaco  nuur  ahorder  la  realisation 
du  systeme  informatif.  L'obligatton  de  fair*  appel  •rt:.,-..r.v..:  s  ;e  :r  oisieme  generation 

:res  puissants  et  dot£s  de  volumineuses  men  oires  '•  s-.-  err.  reouire  substan- 

tiellement  le  nombre. 

L'implantation  de  sept  centres  de  calcul  inierrcg'.or.aux  cunstituera  un  compromis  entre 
d'une  part  le  dCsir  de  rWuire  le  coOt  Oconomique  de  1  'erst mule  de  faciliter  la  coordination 
des  £quipes  d'informaticiens:  et  d'autre  par:  le  refus  Je  crier  ies  centres  :roi>  importants 
Jifftciles  a  g£rer  et  a  dinger. et  posan;  des  pr-.>''.-.  ••  t-  -  .....  ,,vjr:ti  Le  regroupe- 

ment  des  traitements  relatifs  ft  plusieurs  regions  sur  vncun  de  .  es  >ep:  centres  permettra 
de  leur  donner  une  taille  relativement  homogcr.e  or-:--  rera  environ  un  ."nil- 

..on  d'abonn£s  .i  la  :'vn  du  Vieme  Plan 

La  mise  en  place  ce  ces  centres  a  ocoui.  a-  :-r  ‘  _r  jrusnateur  :RIs  ’>'■)  ce 

.a  Compagme  Internationale  pour  i'lnfortt. a- .cue  j  •  .,  1  uulouse  en  inner  .  la 

tnJme  machine  vien:  d'etre  mise  en  serv.ee  at.  t-.n-rr  >•  ;« 1 _  1  .nierr-i-gional  de  Liordeaux 


Photo  n°  1  -  Document  -res  dense  re  i  ~'n.r  in  haut  - 

Rc?'.'..v.dr.  ■ 

Figure  2-1  CCITT  Document  4 
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THE  SLEREXE  COMPANY  LIMITED 

SAPORS  LANE  -  BOOLE  •  DORSET  ■  BH  25  8  ER 
TBJEFHONE  MOLE  (94515)  51617  ■  TELEX  125456 


Our  Ref.  350/PJC/EAC 


18th  January,  1972. 


Dr.  P.N.  Cundall, 
Mining  Surveys  Ltd., 
Holroyd  Road, 
Reading, 

Berks . 


Dear  Pete, 

Permit  me  to  introduce  you  to  the  facility  of  facsimile 
transmission. 

In  facsimile  a  photocell  is  caused  to  perform  a  raster  scan  over 
the  subject  copy.  The  variations  of  print  density  on  the  document 
cause  the  photocell  to  generate  an  analogous  electrical  video  signal. 
This  signal  is  used  to  modulace  a  carrier,  which  is 'transmitted  to  a 
remote  destination  over  a  radio  or  cable  communications  link. 

Ac  the  remote  terminal,  demodulation  reconstructs  the  video 
signal,  which  is  used  to  modulate  the  density  of  print  produced  by  a 
printing  device.  This  device  is  scanning  in  a  raster  scan  synchronised 
with  that  at  Che  transmitting  terminal.  As  a  result,  a  facsimile 
copy  of  the  subject  document  is  produced. 

Probably  you  have  uses  for  this  facility  in  your  organisation. 

Yours  sincerely, 

fjj. 

P.J.  CROSS 

Croup  Leader  -  Facsimile  Research 


Figure  2-2  CCITT  Document  1 
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2. 2. 2. 2  DOCUMENT  5 


Now  consider  Document  5,  as  shown  in  Figure  2-3.  Appendix 

A  gives  288,655  bits  for  Modified  READ  (K  =  «e),  less  28,512 

bits  for  EOL's, gives  260,143  bits  for  the  information  on  the  page. 

An  actual  count  of  encoded  symbols  (including  math  symbols 

and  Greek  letters  but  not  subscripts  or  superscripts)  is 

1,599.  This  should  take  1,599  x  138.992  »  222,249  bits,  which 

subtracted  from  260,143  bits  for  the  entire  image,  gives 

37,894  bits  for  graphics.  This  compares  with  the  42,014 

residue  bits  given  by  Appendix  B  for  CSM.  Subtracting  the 

CSM  residue  from  260,143  bits  gives  218,129  bits  for  transmitting 

218  129 

the  symbols  by  Modified  READ,  or  an  average  of  ■  ^ ■■  = 

136.42  bits  per  symbol,  which  again  compares  favorably  with 
the  138.992  bits  per  symbol  from  section  2.2.1. 

» 

Therefore,  the  bits  required  to  transmit  the  information 
which  cannot  be  symbol  encoded  is  estimated  to  be  37,894  for 
Document  5.  To  this  must  be  added  bits  for  symbols  that 
cannot  be  encoded  using  a  particular  technique. 

2.2.3  SCAN  LINES  WITH  SYMBOL  DETECTIONS 


The  basis  for  most  of  the  techniques  is  the  detection  of  symbols 
as  the  scan  lines  touch  the  top  of  the  symbol.  The  number  of  scan  lines 
an  which  symbols  are  detected  is  obtained  from  Appendix  B.  This  is  given  by 
the  bits  for  COLADD  divided  by  11.  For  Document  1,  this  is 

*  227,  or  about  8.4  scan  lines  per  line  of  text.  Since 
3  lines  of  text  that  were  encoded  by  CSM  will  not  be  encoded  in 
the  proposed  techniques,  25  scan  lines  are  subtracted,  giving 
202  scan  lines  with  symbol  detections  for  Document  1.  For 
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Cda  «l  d’suurn  plus  valaMe  qua  T6f  tat  plus 
pud.  A  oat  dgard  la  figure  2  nprtenu  la  vraia  courbe 
donna  in  1 4(J  )|  to  fooctioa  da/pour  tea  valaura  numd- 
riqiiM  indiqudaa  pap  pcdoddanu. 


Dans  ce  cat,  la  liltra  adapts  poum  4tre  continue, 
conformdmcnt  a  la  figure  j,  par  la  cascade  : 


—  d’un  fibre  passe- bande  da  tranafert  unitd  pour 
/•  </</0  +  A/  et  de  transfert  quasi  nul  pour 
/  <  /«  et/  >  /0  +  4/.  Ultra  nc  modi  flan  t  pas  la  phase 
das  composants  le  traversant  ; 


talla  ligne  4  retard  ast  donnde  par  : 

p  -  -2*  J'  r.  d/ 

Et  catte  phase  ast  Men  I'oppoed  de  !4if\ 
k  un  ddphasap  constant  prds  (sans  importance) 
et  k  un  retard  T,  prds  (indvitabla). 

Un  signal  utile  S(f)  traversant  un  td  Wire  adapt* 
donna  k  la  sortie  (k  un  retard  7*#  prts  at  k  un  ddpha- 
sap  prts  de  la  porteuM)  un  signal  dont  la  tramfomde 
da  Fourier  eat  rdeile,  consume  entre  /„  et  /0+d/, 
et  nulla  de  part  et  d'auue  de/0  et  de/0+Af.  c’est- 
i-dire  un  signal  de  Mquence  porteuse  /„+ 4/72  et 
dont  l’enveloppe  a  la  forme  indiqude  k  la  figure  5, 
oil  I 'on  a  reprdaantd  simultandmant  la  signal  Sfr) 
et  la  signal  S,(f)  correspondent  obtenu  k  la  sortie 
du  Wire  adapt*.  On  comprend  le  nom  de  rdcepteur 
k  compression  d' impulsion  donn*  k  ce  genre  de 
Wire  adapt*  :  la  «  largeur  »  (4  3  dB)  du  si  pal  com* 
prim*  *tant  egale  4  I /AT,  le  rapport  de  compression 

est  de  JL  .  TAf 
1/d/ 


—  Wire  suivi  d'une  ligne  4  retard  (LAR)  disper¬ 
sive  a  yarn  un  temp  de  propagation  de  group  Tt 
ddcrotssant  lindairement  a  vac  la  frequence  /  suivant 
1’ expression  : 


-  F, +(/„-/)  —  (avec  T0>T) 
A f 


(voir  fig.  4). 


On  saisit  physiquemem  le  phtoomdne  de  com¬ 
pression  en  rdahsam  que  lorsqtw  la  signal  S(t)  entre 
dam  la  ligne  4  retard  (LAR)  la  Mquence  qui  entre 
la  pram  ids  4  I’insunt  0  est  la  Mquence  baste  /„ 
qui  met  un  temp  T%  pour  traverser.  La  Mquence  / 


entre  4  1’inaunt  t  -(/-/,)  -L  at  die  met  un  tamp 

T 

Tt-(J -fa  —  pour  traverser,  ce  qui  la  fah  ressortir 
w 

4  1' instant  T.  dealenumr  Aind  done,  le  sienal  .Wrt 


Figure  2-3  CCITT  Document  5 


Document  5, 


445  scan  lines  will  have  symbol  detections. 


2.3  INCREASED  RESOLUTION 

Consideration  is  being  given  to  having  the  option  of 
higher  resolution  for  Group  4  Facsimile.  Therefore  it  was 
desired  to  calculate  the  expected  compression  that  could 
be  obtained  using  300  pels  per  inch  and  400  pels  per  inch. 

Data  comparable  to  that  used  for  estimating  compression  for 
200  pels  per  inch  was  not  available. 

It  was  felt  that  the  number  of  bits  required  to  transmit 
a  document  with  Modified  READ  code  is  approximately  linearly 
proportional  to  the  resolution  in  pels  per  inch,  despite  the 
fact  that  the  total  number  of  pels  in  the  document  is  propor¬ 
tional  to  the  square  of  the  resolution.  The  increase  in  bits 
is  primarily  due  to  the  increased  number  of  scan  lines.  The 
increased  width  of  the  scan  lines  does  not  greatly  increase 
the  number  of  bits  required,  and  the  finer  spacing  of  the 
scan  lines  tends  to  produce  more  efficient  vertical  coding, 
since  a  scan  line  looks  more  like  the  previous  scan  line  than 
with  lower  resolution.  To  confirm  this  impression,  Group  4 
Modified  READ  data  was  obtained  for  200,  300  and  400  pels  per 
inch  for  Document  1  and  Document  5.  The  number  of  bits 
required  to  transmit  these  documents  is  shown  in  Table  2-1. 

This  data  is  not  comparable  to  that  of  Appendix  A  for  200 
pels  per  inch  because  a  different  scan  was  used,  which  resulted 
in  fewer  bits  required.  In  addition,  Table  2-1  shows  the  number 
of  bits  required  for  each  resolution  divided  by  the  number  of 
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RESOLUTION 

lpi 

RESOLUTION 

200 

DOCUMENT  1 

DOCUMENT  5 

Bits/Page 

Bits/Bits  for 
200  lpi 

Bits/Page 

Bits/Bits  for 
200  lpi 

200 

1.0 

132/034 

1.000 

229,204 

1.000 

300 

1.5 

225,499 

1.708 

350,538 

1.529 

400 

2.0 

272,312 

2.062 

468,005 

2.042 

Table  2-1 

COMPRESSED  BITS/PAGE  FOR  VARIOUS  RESOLUTIONS 
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bits  required  for  200  pels  per  inch.  Front  these  ratios,  it 
can  be  seen  that  in  most  cases  the  assumption  that  bits  increase 

linearly  with  resolution  is  a  good  one. 

Table  2-2  shows  the  basic  constants  used  in  making  the 
estimates  of  compression.  The  values  for  200  lpi  are  obtained 
from  Sections  2.1  and  2.2.  For  the  other  resolutions,  Pels 
per  Scan  Line,  Scan  Lines,  Bits  per  Symbol  in  READ  Mode,  and 
Graphics  Bits  per  Document  are  all  increased  in  direct 
proportion  to  the  resolution.  Scan  Lines  with  Symbol  Detections 
also  increases  in  proportion  to  resolution,  but  since  it 
cannot  exceed  the  number  of  symbols  on  the  document,  there  is 
obviously  a  limit  beyond  which  proportionality  will  not  apply. 
Therefore  the  values  are  decreased  slightly  below  proportionality, 
especially  for  400  lpi.  Bits  required  for  Horizontal  and 
Vertical* Position  is  adjusted  to  account  for  the  number  of 
pels  per  scan  line  and  number  of  scan  lines  respectively. 

The  numbers  from  Table  2-2  are  then  used  to  calculate 
compression  for  300  and  400  lpi  in  the  same  way  that  the 
compression  for  200  lpi  is  calculated. 


2.4  CALCULATING  COMPRESSION 

The  number  of  bits  required  to  construct  the  message  is 
totaled.  This  includes  symbol  codes,  graphics,  mode  changes, 
and  horizontal  and  vertical  positions.  The  compression  is 
calculated  by  dividing  the  total  message  bits  into  the  total 
number  of  image  pels,  which  is  always  2,376  x  1,728  *  4,105,728. 
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3.0  MIXED  MODE  ALTERNATIVES 

In  this  section  each  of  the  six  mixed-mode  segmentation 
techniques  are  considered  in  turn.  The  techniques  are: 

SYMBOL  REMOVAL/SCAN  LINE 
SYMBOL  REMOVAL/FULL  DOCUMENT 
SYMBOL  REMOVAL/LINE  OF  SYMBOLS 
EXTENDED  TELETEX 
PARTIAL  LINE  OF  SYMBOLS 
FULL  LINE  OF  SYMBOLS 

For  each  technique,  a  description  is  given  of  the  approach 
and  the  compression  for  Documents  1  and  5  is  estimated.  The 
description  of  the  approach  includes  a  sketch  of  the  composition 
of  a  portion  of  a  mixed-mode  transmission.  In  addition,  images 
of  Documents  land  5  are  presented  showing  the  areas  transmitted 
by  symbols  (shaded) ,  the  lines  of  symbols  constructed  (enclosed 
by  lines) ,  and  the  area  not  transmitted  because  of  the  use  of 
an  optional  CR/LF  symbol  (crosshatched) . 

In  the  three  symbol  removal  techniques,  the  black  pels 
associated  with  symbols  recognized  and  coded  are  "removed" 
(changed  to  white) ,  and  then  the  entire  document  is  encoded 
using  the  Modified  READ  code,  including  the  areas  where  the 
symbols  were.  In  the  other  three  techniques,  the  Modified 
READ  code  is  used  only  for  areas  that  do  not  have  encoded 
characters. 
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3.1  SYMBOL  REMOVAL/SCAN  LINE 

This  approach  is  similar  to  the  Combined  Symbol  Matching 
(CSM)  Technique  presented  by  Compression  Labs,  Inc.  in 
APPENDIX  B,  but  with  a  stored  library  instead  of  a  transmitted 
library. 


3.1.1  DESCRIPTION 

In  this  approach  the  document  is  scanned,  from  top  to 
bottom,  and  from  left  to  right,  until  a  group  of  black  pels 
is  encountered  that  matches  a  symbol  in  the  stored  library. 

When  this  occurs,  all  the  black  pels  within  the  rectangular 
symbol  space  are  changed  to  white,  and  the  symbol  code  and 
position  are  recorded.  After  the  symbols  have  been  removed, 
the  document  is  rescanned  in  principle  and  encoded  using  the 
Modified  READ  code  (K  =  «* ,  no  EOL  code).  The  detected  symbol 
codes  are  inserted  before  the  READ  code  of  the  scan  line  in 
which  the  top  of  the  symbol  occurs.  The  presence  of  a  symbol 
code,  rather  than  READ  code,  is  indicated  by  a  single  bit  at  the 
beginning  of  every  scan  line.  If  the  bit  indicates  that  there 
are  symbols  on  the  scan  line,  the  8-bit  symbol  code  follows, 
and  this  is  followed  by  an  11-bit  horizontal  position  code  word. 
(211  =  2,048,  which  is  greater  than  1,728  pels  in  a  line.)  This 
may  be  followed  by  any  additional  symbols  on  the  scan  line  (in 
order  of  horizontal  position) ,  and  the  symbol  data  is  terminated 
by  a  special  8-bit  symbol  code  that  indicates  that  there  are  no 
more  symbols  on  the  scan  line.  Then  the  READ  code  for  that  scan 
line  (less  encoded  symbols)  is  transmitted.  Figure  3-1  illustrates 
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the  composition  of  a  mixed-mode  message  using  this  segmentation 
approach. 

Notice  that  in  this  technique  the  recognized  symbols 
will  be  encoded  as  they  are  first  encountered  by  the  scanning 
process,  regardless  of  where  they  appear  relative  to  other 
symbols  or  graphics.  The  vertical  position  of  the  symbols  is 
implied  by  the  scan  line  on  which  the  symbol  code  appears. 

Figures  3-2  and  3-3  illustrate  the  areas  of  each  document 
that  are  encoded  as  symbols  using  the  SYMBOL  REMOVAL/SGAN  LINE 
techn ique. 

3.1.2  COMPRESSION  ESTIMATE 
DOCUMENT  1 

All  of  the  typewritten  symbols,  plus  the  "SAPORS 
LANE"  line  will  be  encoded,  which  is  a  total  of  830  symbols. 

(See  Figure  3-2).  This  will  take  8  x  830  =  6,640  bits.  A 
single  bit  at  the  beginning  of  each  scan  line  indicates  the 
presence  of  symbols  on  the  line,  which  will  use  2,376  bits, 
since  there  are  2,376  scan  lines.  It  is  assumed  that  the 
horizontal  position  of  the  symbol  will  require  11  bits.  (This 
could  possibly  be  reduced  somewhat  by  more  efficient  coding.) 

This  will  take  11  x  830  =  9,130  bits.  For  each  of  the  202 
scan  lines  on  which  symbols  are  detected,  a  special  8-bit 
symbol  code  will  be  used  to  indicate  the  end  of  symbols  and  the 
start  of  graphics.  This  will  take  8  x  202  *  1,616  bits.  Finally 
the  graphics  will  take  31,828  bits,  as  discussed  in  Section  2. 2. 2.1. 


3-3 


Summarizing : 


Symbol  codes 

6,640 

Symbol  present 

2,376 

Horizontal  position 

9,130 

End  of  symbols 

1,616 

Graphics 

31,828 

51,590 

"bits 

Compression  =  2 ' = 

79.6 

DOCUMENT  5 

All  of  the  symbols 

except 

subscripts  and  super- 

scripts  will  be  encoded  (see  Figure  3' 

-3)  , 

for  a  total  of 

1,599  symbols.  This  will  take 

8x1, 

599  = 

12,792 

bits. 

The 

horizontal  position  will  take 

11  x  1, 

599  = 

17,589 

bits. 

The 

end  of  symbols  code  will  take 

8  x  445 

=  3, 

560  bits 

,  and 

graphics 

use  37,894  bits  (see  Section  2 

.2.2.2) 

4 

Summar izing : 

Symbol  Codes 

12,792 

Symbol  present 

2,376 

Horizontal  position 

17,589 

End  of  symbols 

3,560 

Graphics 

37,894 

74,211 

bits 

Compression  =  2  <  3774*2ll/28  = 

55.3 
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Figure  3-3 

SYMBOL  REMOVAL/SCAN  LINE  DOCUMENT  5 
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3.2  SYMBOL  REMOVAL/FULL  DOCUAMENT 

This  approach  is  similar  to  SYMBOL  REMOVAL/SCAN  LINE, 
except  that  the  symbols  and  graphics  are  not  interleaved 
in  the  transmitted  message. 

3.2.1  DESCRIPTION 

In  this  approach  the  symbols  are  detected,  removed,  and 

their  codes  and  positions  recorded,  as  in  SYMBOL  REMOVAL/SCAN 

LINE.  But  in  this  technique,  all  the  symbol  information  is 

transmitted  before  any  graphics.  The  vertical  position  of 

12 

the  symbols  is  transmitted  using  an  12-bit  code.  (2  =  4,096, 

which  is  greater  than  2,376  scan  lines).  Then  the  8-bit 
symbol  code  and  11-bit  horizontal  position  code  follows  for 
the  first  symbol  at  that  vertical  position.  If  more  symbols 
have  the  same  vertical  position,  their  symbol  codes  and 
horizontal  positions  follow.  Following  all  the  symbols  on 
the  scan  line,  a  special  8-bit  symbol  is  transmitted  to  terminate 
the  scan  line.  The  symbols  on  the  following  scan  lines  follow 
in  succession.  The  symbol  mode  is  terminated  by  a  single 
12-bit  code,  possibly  a  vertical  position  greater  than  2,376. 

Then  the  entire  document,  with  the  symbols  removed,  is  trans¬ 
mitted  by  Modified  READ  code.  Figure  3-4  illustrates  the 
composition  of  a  mixed-mode  message  using  the  SYMBOL  REMOVAL/ 
FULL  DOCUMENT  segmentation  approach. 

Again  in  this  technique  the  recognized  symbols  will  be 
encoded  regardless  of  their  position.  The  vertical  position 
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of  each  symbol  is  explicitly  stated,  but  the  mode  change 
between  symbols  and  graphics  must  be  done  only  once  in  the 
entire  document. 

The  areas  encoded  as  symbols  by  SYMBOL  REMOVAL/FULL 
DOCUMENT  are  shown  in  Figures  3-5  and  3-6. 


3.2.2  COMPRESSION  ESTIMATE 
DOCUMENT  1 

Again,  830  symbols  will  be  encoded,  taking  8  x  830  * 
6,640  bits.  For  each  of  the  202  scan  lines  on  which  a  symbol 
appears,  the  vertical  position  must  be  given.  For  a  binary  code, 
this  will  take  12  bits,  or  12  x  202  *  2,424  bits.  In  addition, 
the  last  symbol  on  the  scan  line  will  be  indicated  by  a  special 
8-bit  code,  taking  8  x  202  =  1,616  bits.  For  each  symbol,  the 
horizontal  position  is  given  in  an  11-bit  binary  code,  for  a 
total  of  11  x  830  »  9,130  bits.  The  mode  change  from  symbols  to 
graphics  occurs  only  once  in  the  entire  image,  and  takes  only  8 
bits.  Finally,  the  graphics  will  take  31,828  bits.  Summarizing 


Symbol  codes  6,640 

Vertical  position  2,424 

Horizontal  position  9,130 

Last  symbol  on  scan  1,616 

line 

Symbols  to  graphics  8 

Graphics  31,828 


51,646  bits 

Compression  *  ^ 1  “  79.5 


3-9 


DOCUMENT  5 


All  of  the  symbols  except  subscripts  and  superscripts 
will  be  encoded,  for  a  total  of  1,599  or  8  x  1,599  =  12,792  bits. 
For  each  of  the  445  scan  lines  on  which  a  symbol  appears,  a 
12-bit  vertical  position  will  be  given,  using  12  x  445  =  5,340 
bits.  The  code  for  the  last  symbol  on  the  scan  line  will  take 
8  x  445  =  3,560  bits.  The  horizontal  position  for  each  symbol 
will  take  11  x  1,599  =  17,589  bits.  The  graphics  again  take 


37,894  bits.  Summarizing: 

Symbol  codes  12,792 

Vertical  position  5,340 

Horizontal  position  17,589 

Last  symbol  on  scan  line  3,560 
Symbols  to  graphics  8 

Graphics  37,894 

77,183  bits 


Compression  *  Ij.ll.fj.Jt-h128  =  53.2 
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Figure  3-5  SYMBOL  REMOVAL/FULL  DOCUMENT  Document  1 
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Figure  3-  6 

SYMBOL  REMOVAL/FULL  DOCUMENT  Document  5 
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3.3  SYMBOL  REMOVAL/LINE  OF  SYMBOLS 

3.3.1  DESCRIPTION 

In  this  technique  the  symbols  are  detected,  removed,  and 
their  codes  and  positions  recorded  as  in  the  other  symbol 
removal  approaches.  The  symbols  are  then  organized  into  lines 
of  symbols,  based  on  the  symbol  position,  height,  hang  down 
position,  etc.  Account  is  taken  of  small  amounts  of  line  skew, 
and  a  single  vertical  position  is  assigned  to  the  entire  line  of 
symbols.  When  this  process  is  complete,  each  printed  line  on  the 
document  should  be  contained  within  a  line  of  symbols.  Spaces  between 
symbols  are  filled  by  appropriate  blank  characters,  having 
several  different  widths,  up  to  about  2  normal  symbol  spaces. 

If  the  space  between  symbols  is  greater  than  2  symbol  spaces,  the 
line  of  symbols  is  broken  into  segments. 

The  entire  document,  less  recognized  symbols,  is  transmitted 
using  Modified  READ  code.  When  a  scan  line  having  the 
vertical  position  of  a  line  of  recognized  symbols  is  encountered, 
a  special  12-bit  code  (which  could  be  an  EOL  code)  is  inserted. 

This  changes  the  mode  from  graphics  to  symbols.  This  is  followed 
by  an  11-bit  code  giving  the  horizontal  position  of  the  first 
symbol.  Then  the  symbol  codes  for  each  symbol  in  the  segment 
are  sent,  followed  by  a  special  8-bit  end-of-segment  symbol  code. 

This  is  followed  by  an  11-bit  distance  to  the  next  segment  of  symbols. 
The  last  segment  of  symbols  on  the  line  is  followed  by  a  special 
8-bit  end-of-line  symbol  code  instead  of  the  end-of-segment  code. 

This  changes  the  mode  back  to  graphics,  and  the  Modified  READ 
code  is  continued,  until  another  scan  line  with  a  line  of  symbols 
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is  encountered.  Figure  3-7  illustrates  the  composition  of 
a  mixed-mode  message  using  the  SYMBOL  REMOVAL/LINE  OF  SYMBOLS 
segmentation  technique. 

As  with  the  other  symbol  removal  techniques,  a  recognized 
symbol  will  be  encoded  wherever  it  is  located,  since  lines  of 
symbols  may  overlap  vertically,  and  each  line  of  symbols  may 
have  as  few  as  one  symbol.  There  may  be  some  problem  in 
accurately  positioning  symbols,  since  spaces  between  symbols  of 
1  or  2  pels  will  probably  not  be  encoded,  and  the  horizontal 
position  of  a  symbol  could  be  in  error  at  the  end  of  a  long  line 
of  symbols. 

The  symbols  encoded  by  the  SYMBOL  REMOVAL/LINE  OF  SYMBOLS 
technique  are  shaded  in  Figures  3-8  and  3-9,  and  the  lines  of 
symbols  are  enclosed  by  lines. 

3.3.2  COMPRESSION  ESTIMATE 

DOCUMENT  1 

There  are  966  symbols  to  be  encoded  (includes  spaces 
between  words),  taking  8  x  966  *  7,728  bits.  There  are  24  lines 

of  symbols  with  25  segments.  Each  line  has  a  12-bit  code  to 
indicate  that  there  are  symbol  segments  on  the  line,  or  12  x  24 

288  bits.  The  horizontal  position  of  the  first  symbol  is  given 
by  an  11-bit  binary  code,  or  11  x  24  *  264  bits.  The  distance 
between  segments  is  also  given  by  a  11-bit  binary  code.  There 
is  only  one  of  these  for  11  bits  total.  There  are  two  special 
8-bit  codes  for  end  of  segment  and  end  of  line.  There  is  one 
end  of  segment  code  for  8  bits,  and  24  x  8  *  192  bits  for  end 
of  line  codes.  Summarizing: 
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Symbol  codes 

7,728 

Symbols  on  line 

288 

Horizontal  position 

264 

Distance  between  segments 

11 

End  of  segment 

8 

End  of  line 

192 

Graphics 

31,828 

40,319  bits 

Compression  =  2'3^  =  101  • 8 

DOCUMENT  5 

There  are  1,887  symbols  to  be  encoded  (including 
spaces  between  words),  taking  8  x  1,887  =  15,096  bits.  It  is 
assumed  that  subscripts  break  a  line  into  segments,  since  they 
are  not  recognized  and  cannot  be  removed.  Single  character  lines 
can  be  encoded.  There  are  46  lines  with  90  segments.  The  lines 
with  symbols  are  indicated  by  12  x  46  =  552  bits.  The  horizontal 
position  of  the  first  symbol  is  indicated  by  11  x  46  =  506  bits. 
The  distance  between  segments  is  given  by  11  x  (90-46)  =  484 
bits.  There  are  8  x  (90-46)  *  352  bits  for  end  of  segment  and 
8  x  46  =  368  bits  for  end  of  line.  Graphics  again  takes  37,894 
bits  as  given  in  Section  2. 2. 2. 2.  Summarizing: 


Symbol  codes  15,096 
Symbols  on  line  552 
Horizontal  position  506 
Distance  between  segments  484 
End  of  segment  352 
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End  of  line 


368 


Graphics 


Compression 


I 

! 

f 


2,376  X  1,728 
55,252 


74.3 


37,894 
55,252  bits 
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Figure  3-7 
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Figure  3-  9 

SYMBOL  REMOVAL/LINE  OF  SYMBOLS  Document  5 
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3.4  EXTENDED  TELETEX 

This  approach  was  suggested  by  MBLE,  (Appendix  C)  ,  using 
the  basic  TELETEX  method  to  transmit  the  entire  document,  except 
where  there  are  graphic  areas  that  roust  be  transmitted  by 
facsimile. 

3.4.1  DESCRIPTION 

In  this  approach  the  entire  document  is  divided  into 
character  spaces,  except  for  areas  that  are  defined  as  graphics, 
as  discussed  below.  All  character  spaces,  including  blanks, 
are  transmitted  using  8-bit  symbol  codes. 

The  graphics  are  transmitted  by  Modified  READ  code  as 
they  occur  within  a  line  of  symbols.  First,  a  special  8-bit  symbol 
code  is  used  to  designate  the  transition  from  symbol  codes  to 
graphics.  This  is  followed  by  an  11-bit  code  giving  the  width 
of  the  graphics  area.  (The  height  of  the  graphics  area  is 
defined  by  the  height  of  the  symbol  font.)  Then  the  READ  code 
for  the  graphics  is  sent.  The  length  of  the  READ  code  is 
defined  by  the  width  and  height  of  the  graphics  area,  so  the 
transition  back  to  symbol  codes  does  not  require  a  code. 

As  an  option,  instead  of  transmitting  a  series  of  blank 
symbol  codes  at  the  right  end  of  the  line,  a  special  8-bit  code 
could  be  used  to  designate  the  last  symbol  on  the  line.  This 
would  also  have  to  be  to  the  right  of  any  graphics  on  the  line. 

This  symbol  would  direct  the  receiver  to  start  on  the  next  line 
of  symbols,  and  would  replace  the  CR  and  LF  codes  of  TELETEX. 

For  reasons  of  commonality  it  may  be  preferred  to  keep  the  two 


3-21 


standard  TELETEX  symbols  for  this  purpose.  Figure  3-10 
illustrates  the  composition  of  a  message  using  the  EXTENDED 
TELETEX  segmentation  technique. 

This  technique  is  considered  primarily  as  a  method  to 
incorporate  graphics  (such  as  logos  and  signatures) ,  into 
computer-generated  text.  Therefore  graphics  areas  are  defined, 
probably  by  the  user,  as  rectangular  areas  which  may  contain  a 
mixture  of  graphics  and  symbols.  As  a  result,  it  may  be 
difficult  for  a  scanner  to  block  out  a  difficult  document  such  as 
Document  5. 

Since  all  lines  of  symbols  must  have  the  proper  spacing, 
symbols  that  are  not  aligned  with  the  majority  of  the  symbols 
must  be  treated  as  graphics. 

If  lines  of  symbols  exceed  their  normal  vertical  spacing, 
as  in  Document  5,  a  graphics  filler  can  be  inserted  with  any 
number  of  scan  lines  to  keep  the  symbol  grid  aligned  with  the 
symbols  on  the  document.  This  could  be  initiated  with  a  special 
symbol  code. 

The  symbols  encoded  by  the  EXTENDED  TELETEX  technique  are 
shaded  in  Figures  3-11  and  3-12,  with  the  cross-hatched  area 
indicating  space  symbols  that  would  not  be  transmitted  if  the 
CR/LF  symbol  is  used. 

3.4.2  COMPRESSION  ESTIMATE 

DOCUMENT  1 

Character  space  is  high  by  wide,  or  32.6 
pels  high  by  16.3  pels  wide.  The  number  of  symbol  lines  is 
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Total 


2^|7|  *  73  and  number  of  symbols  per  line  is  ^g^"|  *  106* 
symbol  spaces  on  the  entire  document  is  73  x  106  «  7,738.  From 
this  must  be  subtracted  the  symbol  spaces  in  graphics  areas 
shown  in  Figure  3-11,  251,  or  7,487  character  spaces.  With  8 
bits  per  symbol,  8  x  7,487  *  59,896  bits  are  required.  Only 
12  symbol  lines  have  graphics.  Each  of  these  lines  will  need 
an  8-bit  symbol- to-graphics  code,  8  x  12  *  96  bits,  and  an 
11-bit  graphics  width  code  (11  x  12  *  132) .  Bits  to  transmit 
graphics  area  will  be  the  same  as  before,  31,828.  Summarizing: 


Symbol  codes 

59,896 

Symbol  to  graphics 

96 

Graphics  width 

132 

Graphics 

31,8  28 

91,952  bits 

m _ _ .* _  2,376  x  1,728  _  ,  .  . 

Compression  ■  '*  ^  ^52t' .  44.7 

With  a  CR/LF  code,  many  space  characters  can  be  saved.  In 
the  lines  with  symbols,  962  symbols  are  saved,  in  37  blank 
lines  3,922  symbols,  and  in  graphics  lines  533  symbols,  for  a 
total  of  5,417  symbols,  or  8  x  5,417  3  43,336  bits.  Subtracting 
this  from  the  symbol  codes  leaves  a  net  of  59,896  -  43,336  3 
16,560  bits.  To  this  must  be  added  an  8-bit  CR/LF  code  for 
each  of  73  lines,  or  8  x  73  3  584  bits.  Summarizing: 


Symbol  codes 

16,560 

Symbols  to  graphics 

96 

Graphics  width 

132 

CR/LF 

584 

Graphics 

31,828 

49,200 
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Compression 


83.4  with  CR/LF 


2,376  x  1,728  ,, 

49,200 

DOCUMENT  5 

The  character  space  is  about  0.148"  high  by  0.059" 

wide,  or  29  pels  high  by  11.5  pels  wide.  The  number  of  symbol 
2  376 

lines  is  —jg —  *  82  and  the  number  of  symbols  per  line  is 
1  728 

^  *  150.  The  total  number  of  symbol  spaces  on  the 
document  is  82  x  150  »  12,300,  less  symbol  spaces  in  the 
graphics  area,  3,207,  or  9,093.  Using  8  bits  per  symbol, 
the  symbol  codes  take  8  x  9,093  *  72,744  bits.  In  96  cases 
a  shift  from  symbols  to  graphics  is  required.  The  symbols 
to  graphics  code  will  take  8  x  96  =  768  bits,  and  the  graphics 
width  code  will  take  11  x  96  «  1,056  bits.  To  transmit  the 
graphics  area  there  will  be  37,894  bits,  plus  138.992  bits 
for  each  of  the  30  symbols  not  encoded  because  they  are  in 
a  graphics  area,  or  4,170  bits,  for  a  total  of  42,064  bits. 
Summarizing: 


Symbol  codes 

72,744 

Symbol  to  graphics 

768 

Graphics  width 

1,056 

Graphics 

42,064 

116,632  bits 

Compression  -  ~"Lfj-6,632 ' 728  =  35,2 

Using  the  CR/LF,  about  2,828  blank  characters  on  the 
right  hand  margin  could  be  eliminated.  This  would  reduce 
the  symbol  codes  by  8  x  2,828  ■  22,624  bits  to  50,120  bits. 
There  would  have  to  be  a  CR/LF  code  for  each  line,  or  8  x  82  * 
656  bits.  Summarizing: 


Symbol  codes 

50,120 

Symbol  to  graphics 

768 

Graphics  width 

1,056 

CR/LF 

656 

Graphics 

42,064 

94,664  bits 

Compression  =  *6*f728  =  43,4  with  CR/LF 
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Figure  3-11 

EXTENDED  TELETEX  Document  1 
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Figure  3-12 

EXTENDED  TELETEX  Document  5 
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3.5  PARTIAL  LINE  OF  SYMBOLS 


3.5.1  DESCRIPTION 

In  this  approach,  the  document  is  searched  for 
recognizable  symbols,  which  are  then  organized  into  lines,  as 
in  SYMBOL  REMOVAL/LINE  OF  SYMBOLS.  The  symbol  codes  and 
horizontal  positions  are  stored,  together  with  the  vertical 
position  of  the  line.  Then  the  document  is  transmitted, 
basically  using  Modified  READ  code.  When  a  line  of  symbols 
is  encountered,  a  12-bit  (EOL)  code  is  used  to  designate  the 
line  as  containing  symbols.  The  entire  line  of  symbols  in¬ 
cluding  blanks,  is  transmitted  using  symbol  codes,  unless 
there  is  any  graphic  material  contained  within  the  boundaries 
of  the  line.  The  end  of  the  line  is  indicated  when  the  total 
of  the  widths  of  the  symbols  equals  1,728. 

If  graphics  are  contained  somewhere  within  the  line,  a 
special  8-bit  code  is  used  to  indicate  a  change  to  READ  code. 

This  is  followed  by  an  11-bit  code  giving  the  width  of  the 
graphics  material.  (The  width  could  be  to  the  end  of  line  if 
there  are  no  more  symbols  on  the  line;  changes  between  graphic 
and  symbol  mode  are  made  only  when  necessary.)  This  is 
followed  by  Modified  READ  code  defining  the  graphics  area 
within  the  boundaries  of  the  line.  The  length  of  READ  code  is 
defined  by  the  height  of  the  line  (constant  for  the  font)  and  the 
width  of  the  graphic  area.  This  is  then  followed  by  more 
symbols,  unless  the  end  of  the  line  has  been  reached. 

As  an  option,  trailing  blanks  in  the  line  can  be  re- 


placed  by  a  single  8-bit  code  indicating  the  last  symbol  on 
the  line.  This  code  tells  the  receiver  that  the  line  is 
complete,  and  the  next  line,  or  scan  line,  will  be  sent 
next.  Figure  3-13  illustrates  the  composition  of  a  message 
using  the  PARTIAL  LINE  OF  SYMBOLS  segmentation  technique. 

Lines  of  symbols  cannot  overlap  vertically  in  this 
technique,  so  symbols  out  of  vertical  alignment  must  be 
treated  as  graphics. 

The  symbols  encoded  by  the  PARTIAL  LINE  OF  SYMBOLS 
technique  are  shaded  in  Figures  3-14  and  3-15  with  the 
cross-hatched  area  indicating  blank  symbols  that  would  not 
be  transmitted  if  the  CR/LF  symbol  is  used. 


3.5.2  COMPRESSION  ESTIMATE 
DOCUMENT  1 

There  are  24  lines  of  symbols  and  each  line  has  106 
symbol  spaces,  so  the  number  of  symbols  is  24  x  106  =  2,544 
which  will  take  2,544  x  8  =  20,352  bits.  Each  symbol  line 
has  a  12-bit  indicator,  which  is  24  x  12  =  288  bits.  No  lines 
of  symbols  have  interleaved  graphics,  so  there  are  no  symbol 
to  graphics  bits.  Summarizing: 

Symbol  codes  20,352 

Symbol  indicator  288 

Last  symbol  0 

Graphic  distance  0 

Graphics  31,828 

52,468  bits 
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78.3 
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2,376  x  1,728 

Compression  =  ■f52>468  f -  = 

If  a  CR/LF  symbol  is  used,  928  blank  symbols  are  saved,  so 
the  total  number  of  symbols  is  2,496  -  928  ■  1,568,  which  take 
8  x  1,568  *  12,544  bits.  To  this  must  be  added  24  CR/LF 
symbols,  for  8  x  24  =  192  bits.  Summarizing: 

Symbol  codes  12,544 

Symbol  indicator  288 

Last  symbol  0 

Graphic  distance  0 

CR/LF  192 


Graphics 


31,828 


44,852  bits 


Compression  =  2  > 728  =  91.5  with  CR/LF 


DOCUMENT  5 

There  are  36  lines  that  can  be  formed  into  lines  of 
symbols,  and  they  contain  3,157  symbols,  including  spaces. 

This  takes  8  x  3,157  =  25,256  bits.  There  are  36  symbol 

indicator  codes  @  12  bits,  or  36  x  12  »  432  bits.  There  are 
14  lines  that  begin  with  graphics,  and  each  requires  a  last 
symbol  code.  In  addition,  there  are  39  symbol  segments 
that  terminate  with  graphics,  rather  than  the  end  of  the  line. 
Each  of  these  requires  a  last  symbol  code,  for  a  total  of  53 
codes,  taking  8  x  53  =  424  bits.  Within  the  lines  of  symbols 
there  are  53  segments  of  graphics,  each  requiring  an  11-bit 
width  code,  for  a  total  of  11  x  53  *  583  bits.  In  this  case 
there  are  21  symbols  not  encoded  because  they  do  not  lie  on 
the  lines  of  symbols.  They  will  take  138.992  x  21  *  2,919 
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bits.  Added  to  the  37,894  bits  for  graphics  gives  a  total 
of  40,813  bits  for  graphics. 


Symbol  codes 

25,256 

Symbol  indicator 

432 

Last  symbol 

424 

Graphic  distance 

583 

Graphics 

40,813 

67,508  bits 

Compression  =  - '  3^--|~0^728  *  60.8 

Using  a  CR/LF  code,  469  blank  symbols  at  the  end  of  27 
lines  can  be  saved.  This  will  reduce  the  symbol  codes  by 
8  x  469  =  3,752  bits,  to  21,504  bits.  To  this  must  be  added 


27  CR/LF  codes. 

using  8  x  27  *  216 

bits.  Summarizing: 

Symbol  codes 

21,504 

Symbol  indicator 

432 

Last  symbol 

424 

Graphic  distance 

58  3 

CR/LF 

216 

Graphics 

40  ,813 

2 

Compression  =  — 

,376  x  1,728  = 

C  *3  Q  ”7  O 

63,972  bits 

with  CR/LF 
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Figure  3-13 


Figure  3-14 

PARTIAL  LINE  OF  SYMBOLS  Document  1 
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Figure  3-15  PARTIAL  LINE  OF  SYMBOLS  Document  5 
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3.6  FULL  LINE  OF  SYMBOLS 


3.6.1  DESCRIPTION 

In  this  approach,  the  document  is  searched  for 
recognizable  symbols,  which  are  then  organized  into  lines, 
as  in  SYMBOL  REMOVAL/LINE  OF  SYMBOLS.  The  symbol  codes  and 
horizontal  positions  are  stored,  together  with  the  vertical 
position  of  the  line.  Then  the  document  is  transmitted, 
basically  using  the  Modified  READ  code.  When  a  line  of 
symbols  only  is  encountered,  a  12-bit  (EOL)  code  is  used  to 
designate  the  line  as  a  symbol  line.  The  entire  line,  including 
blanks,  is  transmitted  using  symbol  codes.  If  any  graphics 
are  contained  within  the  boundaries  of  the  line,  the  entire 
line  is  transmitted  using  the  Modified  READ  code. 

Successive  lines  of  symbols  are  transmitted  without  a 
requirement  for  a  mode  change,  until  graphics  are  detected. 

At  that  point,  a  special  8-bit  code  is  used  to  signify  a  return 
to  the  graphics  mode. 

As  an  option,  trailing  blanks  in  the  line  can  be  replaced 
by  a  single  8-bit  code  indicating  the  last  symbol  on  the  line. 
This  code  tells  the  receiver  that  the  line  is  complete,  and  the 
next  line,  or  scan  line,  will  be  sent  next.  Figure  3-16 
illustrates  the  composition  of  a  message  using  the  FULL  LINE 
OF  SYMBOLS  segmentation  technique. 

Lines  of  symbols  cannot  overlap  vertically  in  this 
technique,  so  symbols  out  of  vertical  alignment  must  be  treated 
as  graphics. 
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The  symbols  encoded  are  shaded  in  Figures  3-17  and  3-18 
with  the  cross-hatched  area  indicating  blank  symbols  that 
would  not  be  transmitted  if  the  CR/LF  symbol  is  used. 

3.6.2  COMPRESSION  ESTIMATE 
DOCUMENT  1 

There  are  24  lines  of  symbols,  each  with  106 
symbol  spaces,  so  the  number  of  symbols  is  24  x  106  *  2,544, 
which  will  take  8  x  2,544  *  20,352  bits.  There  are  10 
groupings  of  full  lines  (blank  lines  are  sent  by  graphics) , 
each  requiring  a  12-bit  graphics- to-symbols  indicator,  or 
12  x  10  *  120  bits.  At  the  end  of  each  grouping  of  lines  there 
is  a  8-bit  symbols-to-graphics  code,  of  8  x  10  *  80  bits.  The 
graphics  are  31,828  bits.  Summarizing: 


Symbol  codes 

20,352 

Graphics  to  symbols 

120 

Symbols  to  graphics 

80 

Graphics 

31,828 

52,380  bits 

Compression  -  2' 37%2*380728  *  78 • 4 

Using  the  CR/LF  character,  928  blank  symbols  can  be  saved, 
reducing  the  symbol  bits  by  8  x  928  *  7,424  bits  to  12,928 
bits,  with  the  addition  of  8  x  24  *  192  bits  for  CR/LF  codes. 
Summarizing: 


Symbol  codes 

12,928 

Graphics  to  symbols 

120 

Symbols  to  graphics 

80 

CR/LF 

192 

Graphics 

31,828 

45,148  bits 

Compression  »  ,90.9  with  CR/LF 

DOCUMENT  5 

Only  6  lines,  having  431  symbols,  do  not  have 
graphics  (including  subscripts)  associated  with  them,  and 
can  be  coded  as  a  full  line  of  symbols.  Even  these  lines 
could  not  be  coded  as  symbols  with  the  document  in  its  un¬ 
modified  state,  because  of  the  vertical  line  down  the  center 
of  the  page.  There  are  6  x  150  *  900  symbol  spaces  to  be 
encoded,  which  will  take  8  x  900  =  7,200  bits.  The  full 
lines  appear  in  4  groupings.  Each  change  from  graphics  to 
symbols  requires  a  12-bit  code,  or  4  x  12  =  48  bits.  Each 
change  back  to  graphics  requires  an  8-bit  code,  or  8  x  4  *  32 
bits.  There  are  1,599-431-21=  1,147  additional  symbols  that 
are  not  coded  as  symbols,  and  these  will  take  138.992  bits 
each,  or  1,147  x  138,992  =  159,424  bits  as  graphics.  In 
addition,  the  normal  graphics  takes  40,813  bits,  (same  as 
PARTIAL  LINE  OF  SYMBOLS)  for  a  total  of  200,237  bits. 
Summarizing: 
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Symbol  codes 

7,200 

Graphics  to  symbols 

48 

Symbols  to  graphics 

32 

Graphics 

200,237 

207,517  bits 

Compression  *  2  ‘  2^77% =19-8 

Using  the  CR/LF  code,  144  blank  spaces  will  be  saved  on 
the  6  full  lines.  This  reduces  the  number  of  symbol  spaces 
to  900  -  144  =  756,  which  takes  8  x  756  =  6,048  bits.  In 
addition,  6  CR/LF  codes  are  needed,  with  8  x  6  -  48  bits. 
Summarizing: 


Compression 


Symbol  codes 

6,048 

Graphics  to  symbols 

48 

Symbols  to  graphics 

32 

CR/LF 

48 

Graphics 

200,237 

206,413  bits 

2f 206,413J~28  =  19,9  with  CR/LF 


Figure  3-16 


THE  SLEREXE  COMPANY  LIMITED 


Figure  3-17  FULL  LINE  OF  SYMBOLS  Document  1 
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Figure  3-18 
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4.  COMMONALITY 

In  this  section,  the  commonality  of  a  Mixed  Mode 
machine  with  related  machines  is  discussed.  The  related 
machines  to  be  considered  are: 

(1)  TELETEX  machines 

(2)  Group  4  FACSIMILE  Machines,  without  mixed 
mode  capabilities 

(3)  Group  3  FACSIMILE  machines 

By  commonality  is  meant  the  ability  of  a  Mixed  Mode  machine 
to  transmit  messages  to,  or  receive  messages  from,  these  other 
machines  with  a  minimum  of  modification  from  its  normal 
operation.  Changes  to  the  other  machines  considered  are  not 
permitted,  since  they  will  already  be  in  the  field. 

A  basic  commonality  has  been  designed  into  all  the  Mixed 
Mode  techniques  by  the  use  of  TELETEX  code  and  the  Modified 
READ  II  code  proposed  for  Group  4  FACSIMILE  machines.  The 
Group  4  code  differs  from  Group  3  code  in  that  Group  4: 

(1)  uses  K  =«>  instead  of  K  -  4  for  7.7  lines/mm. 

(2)  deletes  the  EOL  code  for  each  line 

(3)  has  no  provision  for  stuffing  bits  to  achieve 
a  minimum  line  time. 

(4)  May  allow  wrap-around  of  run  length  codes  over 
more  than  one  scan  line. 

4.1  COMMONALITY  WITH  GROUP  3  FACSIMILE 

Commonality  with  Group  3  machines  for  the  various 
techniques  will  be  directly  related  to  the  commonality  with 
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Group  4  machines.  It  is  assumed  that  converting  messages 
between  Group  3  and  Group  4  machines  will  become  common. 
Therefore  there  will  be  no  further  discussion  of  commonality 
with  Group  3  machines,  and  anything  said  about  commonality 
with  Group  4  machines  will  also  apply  to  Group  3  machines, 
plus  the  differences  between  Group  3  and  Group  4  indicated 
above . 

4.2  COMMONALITY  WITH  GROUP  4  FACSIMILE 

Three  of  the  techniques  require  almost  no  changes 
to  be  compatible  with  Group  4  transmissions.  They  are  SYMBOL 
REMOVAL/LINE  OF  SYMBOLS,  PARTIAL  LINE  OF  SYMBOLS,  and  FULL 
LINE  OF  SYMBOLS.  In  each  of  these  techniques.  Group  4  trans¬ 
missions  can  be  received  without  any  modifications  whatsoever, 
and  Group  4  transmissions  can  be  produced  by  simply  inhibiting 
all  symbol  recognitions.  However,  it  may  be  necessary  to 
inhibit  information  about  the  stored  library  to  be  used,  which 
is  true  for  all  techniques. 

In  addition  to  the  above,  for  SYMBOL  REMOVAL/SCAN  LINE 
and  SYMBOL  REMOVAL/FULL  DOCUMENT,  code  bits  that  change  mode 
must  be  deleted  on  transmission,  and  inserted  on  reception. 

For  SYMBOL  REMOVAL/SCAN  LINE  this  is  the  single  bit  that  pre¬ 
cedes  each  scan  line  to  indicate  whether  or  not  there  are 
symbols  on  the  scan  line.  For  SYMBOL  REMOVAL/FULL  DOCUMENT, 
it  is  the  single  8-bit  code  that  indicates  the  change  from 
symbols  to  graphics. 


4-2 


For  EXTENDED  TELETEX ,  a  Group  4  transmission  can  be  obtained 
by  inhibiting  all  symbol  recognitions,  including  blanks,  which 
will  force  the  entire  line  to  be  transmitted  by  graphics.  In 
addition,  the  codes  for  the  last  symbol  on  the  line  and  the 
graphics  width  would  have  to  be  deleted.  For  reception,  the 
last  symbol  code  and  a  graphics  width  code  of  1,728  would  have 
to  be  added  before  each  scan  line  to  convert  the  Group  4  trans¬ 
mission  to  what  the  Mixed  Mode  receiver  expects. 


4.3  COMMONALITY  WITH  TELETEX 

The  EXTENDED  TELETEX  technique  requires  almost  no 
changes  to  be  compatible  with  TELETEX  transmissions.  No 
change  whatsoever  is  required  to  receive  TELETEX  transmission, 
except  for  adding  the  code  that  identifies  the  stored  library 
to  use.  In  transmission,  the  graphics  mode  must  be  inhibited, 
with  space  symbol  codes  being  transmitted  whenever  material 
that  cannot  be  recognized  as  symbols  is  encountered.  Also 
the  Carriage  Return  (CR)  and  Line  Feed  (LF)  codes  must  be 
inserted  for  each  line. 

For  all  the  techniques  except  EXTENDED  TELETEX,  in  trans¬ 
mitting,  the  graphics  mode  must  be  inhibited,  and  a  blank 
symbol  used  to  replace  each  20  pels  of  all-white  or  graphics 
pels.  CR  and  LF  symbols  must  be  inserted  at  the  end  of  each 
line  (Approximately  33  scan  lines) .  Corresponding  changes  must 
be  made  for  reception,  namely  adding  coding  for  approximately 
33  all-white  scan  lines  for  each  LF,  and  deleting  the  CR  and 
LF  codes. 
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In  addition,  for  SYMBOL  REMOVAL/LINE  OF  SYMBOLS,  PARTIAL 
LINE  OF  SYMBOLS  and  FULL  LINE  OF  SYMBOLS,  the  12-bit  (EOI)  code 
that  indicates  a  change  from  graphics  to  symbol  mode  must  be 
deleted  on  transmission,  and  added  on  reception.  For  SYMBOL 
REMOVAL/FULL  DOCUMENT  the  single  8-bit  code  that  indicates  the 
change  from  symbols  to  graphics  must  be  deleted  on  transmission 
and  added  on  reception. 
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5.0  COMPLEXITY  OF  IMPLEMENTATION 

For  the  most  part#  there  is  not  much  to  choose  between 
the  various  techniques  in  terms  of  level  of  complexity  of 
implementation.  In  all  techniques,  the  most  difficult 
problem  is  to  recognize  a  group  of  black  pels  as  a  symbol, 
and  to  decide  which  of  the  symbols  in  the  library  it 
represents.  Additional  complexity  is  encountered  by  some  of 
the  techniques  because  of  the  amount  of  the  image  that  must 
be  stored,  and  due  to  the  fact  that  in  some  techniques  the 
recognized  symbols  must  be  organized  into  lines  of  symbols. 

5 . 1  IMAGE  STORAGE 

In  all  of  the  techniques  except  SYMBOL  REMOVAL/FULL 
DOCUMENT,  the  portion  of  the  document  that  must  be  stored 
at  any  one  time  is  an  area  equivalent  to  the  height  of  the 
symbols  by  the  full  width  of  the  document.  Typically  the 
height  of  an  upper  case  character  is  20  pels,  but  with  large 
fonts  and  considering  hang  down  characters,  this  could  be 
as  much  as  32  pels.  Since  the  width  of  the  document  is  1,728 
pels,  the  total  storage  required  is  32  x  1,728  =  55,296  bits. 
This  much  storage  permits  the  recognition  of  symbols,  removing 
them,  and  organizing  them  into  lines. 

For  SYMBOL  REMOVAL/FULL  DOCUMENT,  the  symbols  for  the 
entire  document  are  transmitted,  and  then  the  graphics  by 
READ  code.  The  entire  document  must  first  be  scanned  for 
symbols  before  any  graphics  are  transmitted.  Therefore  either 
the  entire  document  with  symbols  removed,  or  the  READ  code 
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for  it,  must  be  stored.  The  storage  for  the  pels  would 
require  of  course  2,376  x  1,728  *  4,105,728  bits.  For 
Document  1,  the  READ  code  is  only  31,828  bits,  and  for 
Document  5  it  is  only  37,894  bits  (see  section  2.2.2). 

While  even  more  complex  documents  may  be  encountered,  it 
seems  that  a  storage  of  much  less  than  the  total  number  of 
pels  in  the  document  could  be  used.  Another  approach  is 
to  scan  the  document  twice,  once  for  symbol  recognition 
and  once  to  code  the  graphics  portion.  In  any  event,  the 
amount  of  storage  required  for  SYMBOL  REMOVAL/FULL  DOCUMENT 
appears  to  be  at  least  several  times  that  required  for  the 
other  techniques,  and  possible  up  to  75  times  as  much. 


5.2  ORGANIZATION  INTO  LINES 

For  the  SYMBOL  REMOVAL/SCAN  LINE  and  SYMBOL  REMOVAL/ 

FULL  DOCUMENT  techniques,  each  symbol  that  is  recognized  is 
incorporated  into  the  transmission  as  the  recognition  occurs. 
For  the  remaining  techniques,  additional  calculations  are 
required  to  organize  the  symbols  into  lines  of  symbols  that 
correspond  to  lines  of  print  on  the  document.  To  do  this, 
each  symbol  in  the  library  must  have  stored  with  it  the 
position  of  a  baseline,  which  could  be  arbitrary,  relative 
to  the  scan  line  on  which  the  symbol  is  detected.  The  vertical 
position  of  each  symbol  must  be  adjusted  according  to  this 
factor  in  order  to  bring  all  the  symbols  on  a  printed  line 
into  vertical  alignment.  In  addition,  small  amounts  of  line 
skew  must  be  accounted  for  by  using  the  horizontal  position 
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of  each  symbol.  This  can  be  done  by  calculating  the  linear 
regression  of  the  horizontal  and  vertical  symbol  positions, 
to  obtain  the  line  skew.  This  skew  is  then  used  to  correct 
the  vertical  position  of  each  symbol. 

In  addition,  for  the  EXTENDED  TELETEX,  PARTIAL  LINE  OF 
SYMBOLS,  and  FULL  LINE  OF  SYMBOLS  techniques,  it  is  presumed 
that  lines  of  symbols  do  not  overlap,  and  so  algorithms  are 
required  to  decide  which  line  of  a  set  of  overlapping  lines 
is  to  be  used  for  encoding  symbols.  The  simple  equations  of 
Document  5  illustrate  the  problems  that  may  be  encountered  in 
this  area. 
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6.0  COMPARISON  OP  ALTERNATIVE  TECHNIQUES 

The  compression  estimates  calculated  in  Section  3  for 
200  lpi  for  all  the  segmentation  techniques  that  were  evaluated 
are  summarized  in  Table  6-1.  For  comparison,  the  compression 
for  normal  Group  4  Modified  READ  (k  *  00 ,  no  EOL)  is  included. 
Similar  compression  estimates  were  also  made  for  300  lpi  and 
400  lpi,  using  the  methods  discussed  in  Section  2.3.  The 
results  of  those  calculations  are  shown  in  Tables  6-2  and  6-3. 

Of  the  six  alternative  segmentation  techniques  evaluated 

for  mixed-mode  operation,  none  appears  to  have  a  clear-cut 

advantage  over  the  other  techniques.  However,  the  SYMBOL 

REMOVAL/LINE  OF  SYMBOLS  technique  does  appear  to  offer  some 

advantages  over  the  other  techniques.  First,  the  estimated 

compression  is  significantly  (at  least  10%)  better  than  any 

> 

other  technique,  even  assuming  a  special  CR/LF  symbol  for  the 
techniques  that  could  use  it.  The  compression  is  better 
for  both  the  simple  and  complex  documents.  Second,  common¬ 
ality  with  Group  3  and  4  Facsimile  is  as  good  as  any,  and  is 
significantly  better  than  some,  especially  EXTENDED  TELETEX. 

Its  commonality  with  TELETEX  is  as  good  as  any  technique 
except  EXTENDED  TELETEX.  Third,  in  complexity  it  is  as 
good  as  any  as  far  as  image  storage  is  concerned,  and  is  better 
than  SYMBOL  REMOVAL/FULL  DOCUMENT.  Fourth,  as  far  as  the 
requirement  for  organizing  symbols  into  lines,  it  is  inferior 
only  to  SYMBOL  REMOVAL/SCAN  LINE  and  SYMBOL  REMOVAL/FULL 
DOCUMENT.  In  summary,  SYMBOL  REMOVAL/LINE  OF  SYMBOLS  is 
superior  in  compression  and  commonality  with  Group  3  and  4 
Facsimile,  two  very  important  factors,  and  is  nearly  as  good 
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Techn ique 

Document  1 

Document  5 

W/0  CR/LP 

WITH  CR/LF 

W/0  CR/LF 

WITH  CR/LF 

SYMBOL  REMOVAL/SCAN  LINE 

79.6 

55.3 

SYMBOL  REMOVAL/FULL  DOCUMENT 

79.5 

- 

53.2 

SYMBOL  REMOVAL/LINE  OF  SYMBOLS 

101.8 

mm 

74.3 

EXTENDED  TELETEX 

44.7 

83.4 

35.2 

43.4 

PARTIAL  LINE  OF  SYMBOLS 

78.3 

91.5 

60.8 

64.2 

FULL  LINE  OF  SYMBOLS 

78.4 

90.9 

19.8 

19.9 

MODIFIED  READ  (k  =  O*,  no  EOL) 

27.9 

- 

15.8 

- 

TABLE  6-1 

SUMMARY  OF  COMPRESSION  ESTIMATES  FOR  200  lpi 
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Technique 

■l 

W/0  CR/LF 

WITH 

CR/LF 

W/0  CR/LF 

WITH 

CR/LF 

SYMBOL  REMOVAL/SCAN  LINE 

131.4 

- 

94.5 

- 

SYMBOL  REMOVAL/FULL  DOCUMENT 

131.2 

- 

94.9 

- 

SYMBOL  REMOVAL/LINE  OF  SYMBOLS 

164.2 

- 

124.4 

- 

85.6 

141.8 

67.1 

80.5 

PARTIAL  LINE  OF  SYMBOLS 

135.1 

152.0 

105.0 

109.4 

FULL  LINE  OF  SYMBOLS 

135.3 

151.3 

30.0 

30.1 

MODIFIED  READ  (k  -0 0,  no  SOL) 

41.8 

- 

23.7 

- 

TABX£  6-2 

StMAHY  OF  COMPRESSION  ESTIMATES  FOR  300  lpi 


1  . . . -  . . . 

Technique 

* 

DOCUMENT  1 

DOCUMENT  5 

W/0  CR/IF 

WITH 

CR/LF 

W/0  CR/LF 

WITH 

CR/LF 

SYICCL  REMOVAL/SCAN  LINE 

207.3 

• 

137.4 

- 

SYtCOL  REMOVAL/FULL  DOCUMENT 

185.2 

- 

130.1 

- 

SYMBOL  REMOVAL/LINE  OF  SYMBOLS 

227.6 

- 

176.1 

- 

132.7 

202.7 

103.4 

120.6 

PARTIAL  LINE  OF  SYMBOLS 

194.8 

214.2 

151.5 

156.7 

FULL  LINE  OF  SYNBOLS 

195.0 

213.4 

40.9 

41.0 

MODIFIED  READ  (k  -OQ  no  BQL) 

55.8 

- 

31.6 

- 

TABLE  6-3 


SUMMARY  OF  COMPRESSION  ESTIMATES  FOR  400  lpi 
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as  any  segmentation  technique  as  far  as  the  other  factors 
are  concerned. 

PARTIAL  LINE  OF  SYMBOLS  appears  to  be  the  second  choice. 
It  offers  better  overall  compression  than  any  technique  except 
SYMBOL  REMOVAL/LINE  OF  SYMBOLS.  FULL  LINE  OF  SYMBOLS  has 
equal  compression  for  the  simple  document  (DOCUMENT  1)  but  it 
is  poor  for  the  complex  document  (DOCUMENT  5) .  Also  PARTIAL 
LINE  OF  SYMBOLS  is  as  good  as  SYMBOL  REMOVAL/LINE  OF  SYMBOLS 
as  far  as  commonality  with  Group  3  and  4  facsimile  and  image 
storage  complexity  is  concerned.  In  organizing  lines  of 
symbols,  additional  logic  is  required  to  prevent  lines  from 
overlapping  vertically,  which  is  not  required  for  SYMBOL 
REMOVAL/LINE  OF  SYMBOLS. 

SYMBOL  REMOVAL/SCAN  LINE,  SYMBOL  REMOVAL/FULL  DOCUMENT, 
and  EXTENDED  TELETEX  (with  CR/LF  symbol)  give  about  equal 
compression  for  both  simple  and  complex  documents.  Of  these 
techniques,  SYMBOL  REMOVAL/SCAN  LINE  has  an  edge  over  SYMBOL 
REMOVAL/FULL  DOCUMENT  because  the  storage  requirement  is  less. 
Both  techniques  are  superior  to  EXTENDED  TELETEX  because  of 
the  latter's  requirement  for  a  complex  algorithm  to  organize 
symbols  into  lines,  and  because  of  its  poor  commonality  with 
Group  3  and  4  Facsimile,  an  important  consideration.  Also 
EXTENDED  TELETEX  was  presented  as  a  technique  for  generating 
a  mixed-mode  message  by  a  computer,  not  a  method  for  scanning 
a  document  and  segmenting  it  into  graphics  and  text. 

Finally,  FULL  LINE  OF  SYMBOLS  can  give  poor  compression 
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on  complex  documents,  and  requires  a  complex  algorithm  to 
organize  symbols  into  lines. 

Table  6-4  summarizes  the  subjective  evaluations  given 
to  each  segmentation  technique  for  each  of  the  topics  considered 
in  the  evaluation. 
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7.  RECOMMENDATIONS 

The  mixed-mode  segmentation  technique,  SYMBOL  REMOVAL/ 
LINE  OF  SYMBOLS  appears  to  be  the  preferred  approach  and 
should  be  pursued  further. 

Further  study  is  needed  in  the  following  areas. 

1)  Compression  estimates  should  be  made  on  a  wider 
range  of  documents. 

2)  Compression  should  be  measured  for  a  few  selected 
techniques  using  computer  simulation. 

3)  Studies  should  be  made  of  more  efficient  coding 
techniques,  such  as  a  shorter  code  for  graphics  width. 

4)  Studies  should  be  made  of  techniques  for  organizing 
symbol  detections  into  lines  of  symbols. 
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AJmrmct- Recently  Study  Group  XIV  of  CCI1T  ha*  drifted  a  saw 
RamwwifitMW  (T 4) -itit  itir  ->im  -'-rhiirinf  — fc  —  i  ■  • 
digital  ftcmufls  appantui  connected  to  (aol  switched  telephone 
networks.  A  oue-dbaeaaMti  oodieg  tehmm  ■  uaad  in  which  raa 
lengths  its  —coded  i—q  i  modified  Hag— a  coda.  TWaaUowa  typical 
A4  alia  document!  in  the  form  at  Hack  aad  white  iinagaa  teamed  at 
Manual  laaoiattoh  (3.85  Uaaa/aua,  1728  pato/Uaa)  to  be  iraaamittad  la 
aa  naaga  date  of  about  a  inmate  at  a  fata  of  4800  Mt/a.  The  Recoav 
mawrfafrta  alao  iadudaaa  twofliaaaawoaal  coda,  known  aa  the  modi  Bed 
raiatne  aiaaaaat  addieat  daaignata  (READ)  code,  which  ia  in  the  foot 
of  aa  optioaai  exmerioe  to  the  rata  dtomsasooti  coda.  Thia  axtaotaoe 
shows  typical  domrwaata  tfaaaed  at  Ugh  (twice  annual)  taaoiuboe 
(with  every  foatth  Uae  owe  dhaawainealiy  coded)  to  be  trsnueitteri 
ia  aa  taaiaga  tfaaa  of  about  75  a  at  4800  Mt/a.  Thia  paper  rtaarribat 
the  oodiag  arhaaita  ia  detsg  aad  diacnaaaa  the  factota  which  lad  to  thaw 
choice  la  additioa,  thia  paper  amass  the  paafoauaaca  of  the  codec, 
particniady  ia  ratahoe  to  their  coeipraeaioe  efficiency  aad  mlaaraNBty 
to  taaaaiaum  man,  leaking  uae  of  8  CCITT  rafanaca  doett manta. 

I.  Introduction 

TUDY  GROUP  XIV  of  the  CCIT:  currently  defines  2  Rec¬ 
ommendations  (T.2  and  T.3)  for  the  transmission  of 
documents  by  facsimile  over  the  general  switched  tele¬ 
phone  network.  These  refer  to  Groups  1  sad  2  type  facsimile 
apparatus  and  allow  A4  size  documents  scanned  at  3.8S 
lines/mm  to  be  transmitted  in  6  and  3  min,  respectively. 
Faster  transmission  cannot  be  readily  obtained  using  analogue 
techniques  because  of  the  restricted  bandwidth  of  voiceband 
telephone  circuits.  However,  many  documents  that  are  likely 
to  be  transmitted,  such  as  business  letters,  forms,  and  diagrams, 
can  be  satisfactory  reproduced  when  quantized  and  trans¬ 
mitted  in  the  form  of  two  tones,  Le.,  black  and  white.  Hence, 
a  new  Recommendation  (T.4)  has  been  dnfted  for  Group  3 
type  apparatus  which  relies  on  digital  transmission  techniques. 
The  aim  of  the  standard  is  to  enable  two-tone  A4  documents 
scanned  at  a  (normal)  resolution  of  3.8S  lines/mm  and  sam¬ 
pled  at  1728  samples/line  to  be  transmitted  at  4800  bit/s  in  an 
average  time  of  about  l  min  over  the  general  switched  tele¬ 
phone  network.  It  is  hoped  that  the  draft  Recommendation 
T.4  will  be  ratified  by  CCITT  in  late  1980. 

In  order  to  achieve  this  transmission  time,  the  amount  of 
digital  data  representing  the  document  image  must  be  reduced 
by  source  coding.  At  the  end  of  1977,  SGXTV  of  CCITT 
agreed  that  Group  3  equipment  should  use  t  one-dimensional 
run-length  coding  scheme  and  a  modified  Huffman  code.  A 
one-dimensional  codr.  was  chosen  u  a  suitable  compromise 
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between  obtaining  a  high  compresaion  efficiency  while  mini¬ 
mizing  the  susceptibility  to  trtnsmisiion  errors  and  keeping 
the  implementation  costs  to  a  low  level.  Its  freedom  from 
patent  restrictions  was  also  an  important  consideration.  Ir» 
1979,  an  optional  two-dimensional  coding  scheme,  in  the  form 
of  an  extension  to  the  one-dimensional  coding  scheme,  wee 
added  to  the  Recommendation  T.4  for  Group  3  apparatus. 
The  two-dimensional  scheme  allows  greater  compression  ef¬ 
ficiency  to  be  obtained  for  many  documents,  particularly 
when  they  are  scanned  at  twice  normal  vertical  resolution. 
The  additional  factors  considered  when  chosing  this  code  were 
compatibility  with  the  one-dimensional  code  and  possible 
future  extensions  to  other  codes. 

In  this  paper,  we  describe  both  of  the  coding  schemes  in¬ 
cluded  in  the  Recommendation  T.4,  ss  well  as  the  factors 
which  influenced  the  selection  of  these  schemes  and  assess 
their  performances!  Results  are  given  on  coding  statistics  and 
compression  efficiencies,  aa  measured  by  computer  simulation 
on  8  CCITT  reference  documents  (Figs.  1  and  2).  We  have  used 
2  test  procedures  for  assessing  the  error  susceptibility  of  the 
schemes.  The  results  provide  useful  information  about  the 
likely  extent  of  damage  to  documents  caused  by  transmission 
errors  and  of  the  effectiveness  of  several  methods  which  may 
be  used  to  reduce  the  subjective  effects  of  such  damage. 

Many  companies  and  national  telecommunications  adminis¬ 
trations  undertook  similar  evaluation  studies,  and  it  was  only 
through  widespread  collaboration  and  agreement  under  the 
auspices  of  the  CCITT  that  it  was  possible  to  draft  a  satis¬ 
factory  Recommendation  T.4. 

Section  n  discusses  some  of  the  factors  considered  in  the 
choice  and  design  of  the  one-dimensional  coding  scheme. 
Section  HI  outlines  the  standards  for  Group  3  apparatus  and 
specifies  parameters  such  as  document  size,  resolution,  scan¬ 
ning  rate,  and  modulation  methods.  Section  IV  describes  the 
Group  3  one-dimensional  coding  standard  and  Section  V  sum¬ 
marizes  its  performance,  particularly  in  respect  to  its  suscepti¬ 
bility  to  transmission  errors.  Section  VI  outlines  the  criteria 
used  to  select  the  two-dimensional  coding  scheme  described 
in  Section  VII.  In  Section  VIII,  we  give  the  compression  ef¬ 
ficiency  results  for  this  code  and  discus*  briefly  some  error 
susceptibility  measurements. 

Q.  Criteria  for  Selecting  a  One-Dimensional 
Group  3  Code 

As  stated  in  the  Introduction,  the  main  purpose  of  the 
Group  3  standard  is  to  allow  typical  A4  documents  to  be 
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transmitted  digitally  over  telephone  networks  in  an  average 
time  of  1  min.  The  digital  image  representing  a  document 
is  obtained  by  scanning  it  and  quantizing  each  sample  or  pel 
into  one  of  two  logical  levels,  representing  black  and  white. 
At  a  normal  Group  3  scanning  density  of  3.85  lines/mm 
(about  1 188  lines  on  an  A4  page)  with  each  line  divided  into 
1728  pels,  the  amount  of  data  generated  would  require,  in  the 
absence  of  source  coding,  a  transmission  time  of  just  over  7 
min  at  a  transmission  rate  of  4800  bit/s  However,  due  to  the 
strong  statistical  dependencies  between  pels,  the  transmission 
times  for  most  documents  can  be  considerably  reduced  with  a 
suitable  source  coding  method. 

The  relative  importance  of  factors  such  as  compression  effi¬ 
ciency,  error  susceptibility,  and  complexity  of  implementation 
depend  upon  each  facsimile  application.  A  high  compres¬ 
sion  efficiency  is  particularly  important  for  high  volume  usage. 
In  other  applications,  the  need  is  for  reliable  equipment  provid¬ 
ing  acceptable  copies  over  national  or  international  telephone 
networks  at  reasonable  terminal  costs.  In  addition,  the  me¬ 
chanical  limitation  of  some  equipment  must  be  taken  into 
account,  and  machines  with  a  wide  range  of  facilities  must  be 
able  to  interwork  with  more  basic  equipment 

Based  on  these  criteria,  a  one-dimensional  run-length  coding 
scheme  using  a  modified  Huffman  code  was  chosen  as  a  basic 
Group  3  standard.  Run-length  codes  have  been  widely  em¬ 
ployed  in  data  reduction  systems  and  are  easy  to  implement 
In  most  cases,  Huffman  codes  offer  high  compression  efficien¬ 
cies  and  the  use  of  a  “modified”  code  simplifies  implementa¬ 
tion.  Damage  caused  by  errors  is  kept  to  acceptable  levels  by 
using  a  single  line  coding  scheme  and  by  transmitting  a  robust 
line  synchronization  codeword.  Other  codes  were  investigated 
but  were  found  to  be  equally  susceptible  to  errors  and  gen¬ 
erally  performed  leas  efficiently. 

A.  Compression  Efficiency 

Numerous  source  coding  techniques  have  been  applied  to 
digital  facsimile  signals  in  order  to  reduce  the  amount  of  data 
required  to  transmit  them.  Data  reduction  is  achieved  by 
using  a  code  which  exploits  the  statistical  dependencies  be¬ 
tween  pels.  The  facsimile  signal  is  characterized  by  a  source 
alphabet  for  which  source  symbols  have  low  statistical  de¬ 
pendency.  A  code  table  is  used  which  provides  a  statistically 
optimum  match  between  the  source  alphabet  and  the  code¬ 
words.  Codes  which  make  uee  of  only  the  horizontal  depen¬ 
dencies  between  pels  on  the  same  scan  line  are  usually  classed 
as  one-dimensional  codes,  while  two-dimensional  codas  attempt 
to  obtain  greater  efficiency  by  exploiting  both  horizontal 
and  vertical  dependencies.  Shannon’s  theory  of  communica¬ 
tion  [41  indicates  the  amount  of  data  reduction  that  can  be 
achieved  by  statistical  source  coding  methods,  but  the  actual 
statistical  dependencies  between  pels  can  only  be  determined 
by  experiment. 

A  useful  model  of  a  digital  image  has  been  proposed  by 
Capon  [51 .  Each  scan  line  is  regarded  as  a  first-order  Markov 
chain  in  which  the  color  of  each  pel  Xx  is  dependant  only  on 
the  color  of  the  previous  pel  Xt.x .  The  average  amount  of 
information  per  pel  is  given  by  the  entropy  H^,  where 

£  £  HXt-i.X,)- log KX,IXt-x)  (1) 
x,-i  x{ 

where  the  summation  is  taken  over  all  possible  combinations 
of  2  adjacent  pels. 


This  model  leads  to  run-length  coding  techniques  in  which 
the  digital  image  is  regarded  as  a  sequence  of  alternating  in¬ 
dependent  runs  of  black  and  while  pels.  Two  source  alphabets 
are  formed,  one  consisting  of  all  the  black  run-length  values 
and  the  other  containing  the  set  of  white  runs.  The  average 
white  run-length  value  is  given  by 


~w  m  Z  T  ■  w  (2) 

r»  0 

where  Pw(r)  is  the  probability  of  a  white  run  of  length  r  and 
n  is  the  largest  value  of  r.  The  average  amount  of  informa¬ 
tion  in  bits  for  each  white  run  is  given  by  the  entropy 


Hw 


It 

£ 


Pw(r)  ■  log,  Pw(r). 


(3) 


Similar  equations  can  be  written  for  the  average  black  run- 
length  value  rb  and  the  entropy  of  the  black  runs  Hj,. 

The  entropy  per  pel  //p*  and  the  maximum  theoretical 
compression  factor  Qmax  f or  a  given  set  of  run-length  values 
are  given  by 


Qnmx 


1  _  rw  +7b 


(4) 


//pat  in  (4)  is  usually  higher  than  Hp*  for  the  Capon  model 
given  in  (1)  and  indicates  that  the  run-length  coding  model 
includes  some  of  the  higher  order  dependencies  between  suc¬ 
cessive  pels  of  the  same  color.  There  are,  of  course,  depen¬ 
dencies  between  adjacent  runs  on  the  same  line  which  have 
been  ignored  in  the  above  analysis.  However,  these  depen¬ 
dencies  have  been  found  to  be  small  [6],  providing  about  a 
10  percent  decrease  in  the  value  of  //p*. 

In  the  above  analysis,  the  white  and  black  runs  have  been 
placed  in  separate  source  alphabets.  If  the  black  and  white 
run-length  distributions  are  combined  into  one  source  alphabet 
with  entropy  He,  then  He>\  (Hw  +Z/p)  with  equality  oc¬ 
curring  only  when  Pw(r)  m  Pb(r)  for  all  values  of  r.  Measure¬ 
ments  on  typical  documents  have  shown  that  Qmax  is  increased 
on  average  by  about  25  percent  if  separate  distributions  are 
used  instead  of  a  single  distribution.  It  is  generally  agreed  that 
the  advantage  of  this  increase  in  the  available  efficiency  out¬ 
weighs  the  disadvantage  of  having  to  use  two  separate  code 
tables. 

Many  different  codes  have  been  designed  for  use  in  run- 
length  coding.  However,  it  can  be  shown  that  Huffman’s 
procedure  [7)  is  the  optimum  method  of  constructing  a 
uniquely  decodable  and  instantaneous  code  which  has,  for  a 
given  independent  source  alphabet,  the  smallest  average 
codeword  length.  Such  a  code  is  usually  called  a  compact 
code.  For  example,  the  average  number  of  coded  bits  per 
run  for  the  compact  code  designed  for  the  white  tun  lengths 
will  be  in  the  range 


Hw  <  £Fw(r)  •  nw(r)  <HW  *  I  (5) 


where  nw(r)  is  the  length  of  the  codeword  representing  the 
white  run-length  r.  The  code  is  instantaneous  since  it  is  a 
prefix  code  lor  which  no  codeword  can  appear  as  the  be¬ 
ginning  of  any  other  codeword  in  the  same  code  table.  Thus 
codewords  can  be  decoded  as  soon  at  they  are  received.  Also 
the  code  is  exhaustive  so  that  any  sequence  of  binary  digits 
can  be  decoded. 
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The  Huffman  code  tables  used  in  the  Group  3  standard  reduction  factor  is  often  referred  to  as  the  throughput.  For 
were  designed  from  the  run-length  statistics  averaged  over  example,  following  Burton’s  [81  analysis  for  a  4.8  kbit/s  V27 
many  typical  documents.  Two  Huffman  codes  would  result  ter  modem,  a  “stop-and-wait”  ARQ  system  has  a  throughput 
in  a  large  number  of  codewords  (1728  in  each)  and  hence  a  as  low  as  0.8S  on  a  national  tenet tial  circuit  and  as  low  as 
“modified"  code  was  designed  in  which  every  run  length  0.27  on  a  single  satellite  circuit.  For  a  continuous  ARQ 
greater  than  63  is  broken  into  2  run  lengths,  namely  a  make-up  system,  which  requires  an  asymmetrical  full  duplex  modem 
run  length  having  a  value  N  X  64  (where  M  is  an  integer)  and  a  with  a  backward  channel  rate  of  75  or  ISO  bit/s,  the  correspond- 
term  mating  run  length  having  a  value  between  0  and  63.  This  ing  throughputs  are  as  low  as  0.89  and  0.84,  respectively, 
reduces  the  number  of  codewords  required  and  simplifies  FEC  codes  have  higher  effective  transmission  rates  but  must 
implementation.  The  extra  run  lengths  required  lead  to  be  designed  for  the  errors  experienced  on  practical  circuits  and 
modified  source  statistics  but  the  reduction  in  efficiency  is  can  be  very  sensitive  to  changes  in  those  error  patterns.  Re- 
smalL  Furthermore,  measurements  have  shown  that  the  cently,  an  implementation  of  a  Reed-Solomon  code  on  a 
modified  Huffman  code  tables  used  in  the  Group  3  standard  microprocessor  has  been  demonstrated  [91.  This  system  is 
are  reasonably  insensitive  to  considerable  changes  in  the  source  capable  of  correcting  a  single  bum  error  spread  over  up  to 
statistics  and  most  documents  can  be  transmitted  with  high  17  bits  in  a  block  of  255  bytes,  and  has  a  throughput  of 
efficiency.  greater  than  0.97.  However,  it  is  difficult  to  assess  the  per¬ 

formance  of  this  system  since,  in  general,  the  error  charac- 
B.  Error  Susceptibility  t eristics  of  most  switched  telephone  networks  are  not  suf- 

An  acceptable  digital  facsimile  system  for  use  on  the  general  ficiently  well  known, 
switched  telephone  network  must  include  some  method  of 

dealing  with  transmission  errors  and  of  limiting  their  effects.  C  Complexity  of  Implementation 

Three  methods  have  been  considered.  In  the  past,  Huffman  codes  have  not  been  used  in  document 

facsimile  systems  because  they  were  generally  regarded  as 

1)  restriction  of  the  damage  caused  by  errors  to  as  small  an  ^  difficult  to  implement.  For  example,  Huffman  decoders 

area  on  the  document  as  possible;  usually  relied  on  simple  table-lookup  or  tree-follower  methods 

2)  detection  of  errors  and  retransmission  of  blocks  of  date  which  a  ^  unount  of  storMC  are  slow  in  opera- 

in  error  using  an  automatic  repeat  request  (ARQ)  system;  tiotL  K  wu  also  thought  that  the  codes  did  not  recover 

3)  detection  of  emus  and  their  correction  at  the  receiver  quickly  {rom  errors.  Instead  of  a  number  of 

using  a  forward-acting  emu  correcting  (FEC)  code.  other  codes,  comma  or  block  codes  [3],  [101,  [11],  [121 

The  Group  3  coding  system  has  been  designed  so  that  the  were  proposed  for  facsimile  use.  Although  they  were  easier  to 
d»m»gg  caused  by  a  transmission  error  is  confined,  when  pos-  implement,  they  usually  gave  somewhat  lower  compression 
sible,  to  the  scan  line  in  which  the  error  occurs.  This  is  achieved  factors  and  were  more  sensitive  to  changes  in  document 
by  transmitting  a  special  synchronizing  sequence  called  the  statistics.  Recently,  methods  of  decoding  Huffman  codes  have 
end-of-line  (EOL)  codeword  at  the  end  of  each  coded  line,  been  devised  which  are  fast  and  require  a  modest  amount  of 
This  codeword  is  unique  since  it  consists  of  a  sequence  of  storage  [131,  [14).  Furthermore,  after  the  modified  Huffman 
digits  which  cannot  occur  naturally  anywhere  in  a  scan  line  code  had  been  proposed  for  the  Group  3  standard,  in  recovery 
of  coded  date  (see  Section  IV-C).  It  can  therefore  be  easily  properties  were  studied  is  detail.  From  this  investigation  it 
recognized  and  although  a  coded  line  is  damaged  by  a  trass-  was  concluded  that  the  code  performed  satisfactorily  in 
mission  disturbance,  all  subsequent  lines  can  be  correctly  respect  of  its  susceptibility  to  errors. 

received  and  decoded.  This  method  has  a  further  advantage;  One  code  in  particular,  the  intermediate  ternary  code  (ITC) 
since  each  correctly  decoded  line  represents  exactly  1728  [31,  [151  was  considered  as  a  possible  alternative  code  for 

pelt,  a  damaged  line  containing  a  different  number  of  de-  Group  3  equipment  since  it  was  thought  that  in  the  presence 
coded  pels  can  be  detected.  The  receiver  may  then  be  able  to  of  errors,  it  might  perform  better  than  a  Huffman  code.  In 
reduce  the  subjective  effect  of  the  damage  on  the  document  the  ITC  scheme,  the  binary  numbers  representing  the  values 
by  employing  one  of  the  error  concealment  processes  described  of  the  run  lengths  are  converted  to  a  ternary  state  which 
in  Section  V-D.  distinguishes  between  black  and  white  runs.  Ternary  pentades 

ARQ  and  FEC  methods  were  considered  but  neither  have  are  then  converted  into  binary  octades  which  form  the  coded 
been  incorporated  in  the  Group  3  standard  since  they  are  data.  In  its  basic  form,  this  code  does  not  require  a  code  table 
complex,  add  extra  coat  to  the  equipment,  and  increase  the  and  therefore  the  implementation  is  simple.  Also,  since  the 
transmission  times.  Furthermore,  there  is  insufficient  evidence  codewords  are  all  8  bits  long,  there  can  be  no  loea  of  code- 
to  show  whether  either  of  these  methods  would  be  suitable  on  word  synchronization.  However,  a  comparison  of  the  effects 
general  telephone  networks  and  manufacturers’  field  expen-  of  errors  on  the  ITC  code  with  the  effects  on  the  modified 
ence  indicates  chat  acceptable  document  quality  can  be  Huffman  code  [16],  indicated  that  there  was  very  little 
obtained  for  moat  facsimile  oils  without  the  use  of  error  practical  difference  between  the  codes.  In  addition,  the  ITC 
correction  methods.  However,  studies  indicate  that  error  code  produced  slightly  lower  compression  factors  than  the 
control  techniques  may  be  applicable  to  future  facsimile  modified  Huffman  code, 
equipment.  _ _  „  _ 

ARQ  systems  have  the  advantage  of  being  very  reliable  and  ra-  Elements  of  the  Group  3  Recommendations 
insensitive  to  changes  in  the  channel  error  rate.  Systems  can  The  specifications  needed  to  provide  for  interworking  of 
be  designed  such  that  the  probability  of  an  undetected  error  Group  3  facsimile  equipment  over  the  general  switched  tele¬ 
in  a  block  (of  say  2048  bits)  in  laaa  than  1  in  10* .  However,  phone  network  are  given  in  CCITT  Recommendations  T.4  and 
they  lead  to  a  reduction  in  the  effective  transmission  rate-the  T.30.  Recommendation  T.4  contains  standards  concerning 


A- 5 


HUNTER  AND  ROBINSON:  INTERNATIONAL  FACSIMILE  CODING  STANDARDS 


859 


the  foilowins  parameters:  document  size,  resolution,  scsnnim 
rate,  source  coding  methods,  and  modulation  method.  As* 
sociated  with  these  are  a  number  of  options  which  enable 
facsimile  equipment  to  communicate  in  alternative  modes, 
e.g.,  a  higher  vertical  resolution  is  provided  so  that  higher 
quality  copies  can  be  obtained.  Two  facsimile  machines  may 
communicate  using  any  of  the  options  by  mutual  agreement, 
otherwise  they  must  use  the  appropriate  recommended 
standard. 

Recommendation  T.30,  which  is  not  covered  by  this  paper, 
specifies  the  digital  signals  and  procedures  used  by  Group  3 
apparatus  for:  call  setup,  ptemessage  procedures  for  identify¬ 
ing  and  selecting  the  required  facilities,  message  transmission, 
postmessage  procedure,  and  call  release.  The  following  sub¬ 
sections  summarize  the  parameters  defined  in  T.4. 

A.  Dimensions  of  Apparatus 

1)  Facsimile  machines  should  be  able  to  accept  A4  size 
documents.  As  an  option,  documents  up  to  A3  in  size  may 
be  transmitted  with  the  same  resolution. 

2)  The  normal  vertical  resolution  is  3.85  lines/mm.  A  high 
vertical  resolution  of  7.7  lines/mm  is  available  as  an  option. 

3)  Each  scan  line  on  an  A4  document  is  divided  into  1 728 
black  and  white  pels.  The  number  of  pels  may  be  optionally 
increased  to  about  2600  to  allow  documents  up  to  A3  size  to 
be  transmitted  at  the  same  resolution. 

4)  The  scanning  line  length  on  an  A4  document  is  215  mm. 
Other  line  lengths  may  be  used  provided  that  the  vertical  resolu¬ 
tion  is  adjusted  to  maintain  the  correct  picture  proportions. 

The  normal  vertical  resolution  is  chosen  to  be  the  same  as 
that  used  in  Groups  1  and  2.  A  higher  resolution  is  included 
to  allow  higher  quality  copies  to  be  obtained.  A  horizontal 
resolution  which  is  nearly  twice  that  of  the  normal  vertical 
resolution  is  required  to  ensure  that  staircase  effects  on  vertical 
black  and  white  edges  due  to  the  sampling  and  quantizing 
processes  do  not  impair  legibility. 

B.  Minimum  Scan  Line  Times  and  Message  Format 

These  are  specified  so  that  transmitters  and  receivers  can 
keep  in  step  on  a  line-by-line  basis  and  to  allow  for  mechanical 
limitations  of  some  machines.  Also,  some  receivers  operate 
at  normal  resolution  by  printing  each  scan  line  twice  at  high 
resolution.  The  recommended  standard  minimum  scan  line 
times  (MSLT)  is  20  ms  {equivalent  to  a  minimum  Of  96  coded 
bits  at  a  transmission  rate  of  4800  bit/s)  and  there  are  options 
of  10  ms  (48  bits),  5  ms  (24  bits)  and  0  ms  (Le.,  no  restric¬ 
tion).  Any  machine  offering  an  option  must  be  sble  to 
operate  at  all  longer  MSLTs  down  to  20  ms.  The  recom¬ 
mendation  also  includes  a  40  ms  option.  The  coding  pro¬ 
cedure  includes  a  method  of  adding  varying  length  strings  of 
“fill**  bits  to  those  coded  lines  containing  fewer  than  the 
required  number  of  bits.  Fill  bits  are  easily  recognized  by  the 
receiver  and  are  discarded. 

Fig.  3  shows  the  format  of  the  data  for  several  coded  lines. 
The  end  of  document  transmission  is  indicated  by  6  consecu¬ 
tive  EOL  codewords  which  form  the  return  to  control  (RTC) 
signal 

C.  Modulation  and  Demodulation  Methods 

When  operating  on  the  general  switched  telephone  network, 
it  is  recommended  that  Group  3  equipment  should  use  data 
rates  of  4800  bit/s  and  2400  bit/s  and  the  modulation,  scram- 
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bier,  equalization,  and  timing  signals  defined  in  CCfTT  Recom¬ 
mendation  V27  ter.  However,  where  higher  speeds  of  opera¬ 
tion  are  possible,  it  has  been  agreed  that  Group  3  equipment 
may  operate  optionally  at  9600  bit/s,  and  7200  bit/s  using 
the  modulation,  scrambler,  equalization  and  timing  signals 
defined  in  CCITT  Recommendation  V29.  Some  PTT  Adminis¬ 
trations  point  out,  however,  that  it  may  not  be  possible  to 
guarantee  the  service  at  a  data  signalling  rate  higher  than 
2400  bit/s. 

IV.  One-Dimensional  Coding  Scheme 

This  code  was  first  suggested  by  the  Plessey  Company  in 
1976.  Later  a  revised  version  of  the  code  was  proposed 
jointly  by  a  number  of  British  and  American  companies  under 
the  auspices  of  the  British  Facsimile  Industries  Compatibility 
Committee  (BFICC)  and  the  Electronic  Industries  Association 
(EIA)  [17],  It  is  this  version  of  the  code  that  was  eventually 
accepted  by  SGXTV  of  the  CCITT.  The  extended  code  table 
described  in  Section  IV-B  is  due  to  a  proposal  made  by  the 
British  Post  Office. 

A.  The  Coding  Scheme 

Each  scan  line  is  regarded  as  a  sequence  of  alternating  black 
and  white  lines.  All  lines  are  assumed  to  begin  with  a  white 
run  to  ensure  that  the  receiver  maintains  color  synchroniza¬ 
tion;  if  the  first  actual  run  on  a  line  is  black,  then  a  white  run 
of  zero  length  is  transmitted  at  the  beginning  of  the  line. 

Separate  code  tables  are  used  to  represent  the  black  and 
white  runs  and  these  are  given  in  Table  I.  Each  code  table  can 
represent  a  run-length  value  up  to  the  maximum  length  of  one 
scan  line  (1728  pels)  and  contains  two  types  of  codewords: 
terminating  codewords  (TC)  and  make-up  codewords  (MUC). 
Runs  between  0  and  63  pels  are  transmitted  using  a  single 
terminating  codeword.  Runs  between  64  and  1728  are  trans¬ 
mitted  by  a  MUC  followed  by  a  TC.  The  MUC  represents  a 
run-length  value  of  64  X  :V  (where  N  is  an  integer  between  1 
and  27)  which  is  equal  to,  or  shorter  than,  the  value  of  the 
run  to  be  transmitted.  The  following  TC  specifies  the  dif¬ 
ference  between  the  MUC  and  the  actual  value  of  the  run  to 
be  transmitted. 

The  coding  of  each  scan  line  continues  until  all  runs  on  the 
line  (Le.,  a  total  of  1 728  pels)  have  been  transmitted.  Each 
coded  line  is  followed  by  the  EOL  codeword.  As  stated  in 
Section  II-B,  the  EOL  codeword  is  a  unique  sequence  which 
cannot  occur  within  a  valid  line  of  coded  data.  It  can  be 
detected  irrespective  of  the  way  in  which  the  decoder  breaks 
up  the  coded  line  into  codewords.  Thus,  if  a  transmission 
error  corrupts  some  of  the  coded  scan  line  data,  then  the  error 
cannot  prevent  the  EOL  from  being  detected. 

If  the  number  of  coded  bits  in  a  line  is  fewer  than  a  certain 
agreed  minimum  (Section  III-B),  then  “fill”  bits  consisting  of 
varying  length  strings  of  “0”’i  are  inserted  between  the  line  of 
coded-data  and  the  EOL  codeword. 
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B.  The  Extended  Code  Tablet 

The  Group  3  standard  provide*  an  optional  (xtenaion  to  ths 
coding  act  erne  allowing  m«rtiina«  to  transmit  larger  paper 
widtli*  up  to  A3  in  size  which  require  nearly  2600  pel* /line. 
This  option  i*  provided  by  2  extended  code  able*  formed  by 
adding  1 3  extra  MUC  Hated  in  Table  □  to  each  of  the  basic 
code  tabic*  given  in  Table  I.  The  construction  of  the  extra 
codewords  is  described  in  Section  IV-C.  The  use  of  the  ex- 
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tended  code  table  is  signalled  in  the  Recommendation  TJO 
control  procedures. 

G  Construction  of  the  Modified  Huffman  Code  Tablet 

The  properties  of  the  EOL  codeword  can  be  further  under* 
stood  by  considering  the  construction  of  the  modified  Huffman 
code  tables.  Each  code  table  was  initially  designed  accord¬ 
ing  to  Huffman’s  procedure  and  to  contain  the  codeword 
“0000000*’  (7  X  **0”)  which  was  designated  to  signal  the  end 
of  a  scan  line.  Redundant  bits  were  then  added  to  the  code¬ 
word  7  X  “O’*  to  form  the  codeword  10  X  “0”  +  “1”.  By 
examining  Table  I,  it  can  be  seen  that  no  codeword  ends  in 
a  sequence  of  more  than  3  “0*”s  or  begins  with  a  sequence  of 
“0**’s  larger  than  6  and  therefore  10  X  “0”  +  “1"  forms  a. 
unique  sequence  which  cannot  be  produced  by  a  concatena¬ 
tion  of  codewords.  The  final  “1”  of  the  EOL  is  included  to 
indicate  the  start  of  the  next  coded  line,  since  fill  bits  may 
extend  the  sequence  beyond  10  X  “0”. 

The  extended  black  and  white  code  tables  were  formed 
using  the  codeword  7  X  “O’*  as  the  prefix  for  the  13  extra 
MUC.  The  7  X  “0“  codeword  originally  designated  to  signal 
the  end  of  a  scan  line  now  needs  to  be  increased  to  8  X  “0”. 
Redundant  bits  are  then  added  to  this  codeword  to  form  the 
EOL  codeword  11  X  “0"  +  “1”  which  is  unique  using  either 
the  basic  or  extended  code  tables.  This  process  can  be  carried 
out  without  altering  any  of  the  other  codewords  in  Table  1. 
The  same  13  extra  codewords  can  be  added  to  each  code 
table  without  a  loss  in  efficiency  since  long  runs  occur  very 
infrequently. 

V.  Performance  of  the  One -Dimensional 
Coding  Scheme 

The  performance  of  the  one-dimensional  coding  scheme  was 
assessed  by  computer  simulation.  Measurements  were  made 
on  run-length  data  obtained  for  the  8  CC1TT  A4  reference 
documents  which  were  recorded  on  magnetic  tape  by  CCITT 
on  behalf  of  the  French  Administration  of  Posts  and  Telecom¬ 
munications.  Each  document  was  scanned  at  high  resolution 
(7.7  lines/mm)  and  contained  2376  lines.  Measurements  made 
at  normal  resolution  (3.85  lines/mm)  used  the  1188  odd 
numbered  lines. 

Two  main  types  of  simulation  wen  performed  to  assess  the 
error  sensitivity  of  the  one-dimensional  coding  scheme.  Firstly, 
coded  data  obtained  using  the  one-dimensional  coding  scheme 
was  subjected  to  tingle  random  errors  in  order  to  study  the 
resynehronization  properties  of  the  modified  Huffman  code 
(Section  V-B).  This  method  was  originally  designed  to  com¬ 
pare  the  resynehronization  properties  of  various  variable 
length,  comma-free  codes.  Secon-  , ,  error  susceptibility 
measurements  stars  made  on  a  number  of  documents  using 
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real  error  patterns  obtained  from  actual  telephone  lines.  The 
error  patterns,  which  were  recorded  on  masnetic  tape,  were 
produced  by  the  University  of  Hannover  by  transmittins 
pseudorandom  patterns  over  telephone  lines  using  a  4.8 
kbit/s  V27  ter  modem  and  by  comparing  them  with  the 
received  patterns.  The  magnetic  tape  contained  error  pettems 
for  4  telephone  lines  and  the  first  of  these  were  used  to 
produce  the  results  given  later  in  this  paper. 

A.  Compression  Factors 

Table  HI  lists  the  average  run  lengths,  entropies,  and  maxi* 
mum  theoretical  compression  fact  on  for  the  run- 

length  statistics  obtained  from  the  8  CC1TT  documents 
scanned  at  high  resolution.  Table  IV  gives  the  correspond¬ 
ing  values  for  the  modified  statistics  obtained  by  braking 
up  runs  greater  than  63  into  2  runs  as  required  by  the  modi¬ 
fied  Huffman  code  and  indicates  that  Qm..  has  been  reduced 
by  about  14  percent.  To  indicate  that  the  coding  scheme 
performs  efficiently  for  a  wide  range  of  documents,  Table 
IV  also  includes  the  actual  compression  factors  C  which 
were  calculated  using  the  modified  Huffman  code  but  exclud¬ 
ing  EOL  codewords  and  fill  bits. 

Table  V,  which  lisa  the  number  of  coded  bits  obtained  for 
the  documents  scanned  at  normal  (low)  and  high  resolution. 
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indicates  that  an  increase  in  the  MSLT  parameter  can  substan¬ 
tially  increase  the  number  of  bia  required  to  code  a  document. 
At  low  resolution,  the  average  transmission  times  for  the 
eight  documents  using  a  transmission  rate  of  4800  bit/s  are: 
56.6  s  for  a  MSLT  of  0  ms,  and  60.5  s  for  a  MSLT  of  20  ms. 
At  high  resolution,  the  corresponding  times  are  1 1 1  and  121s, 
respectively. 

Care  should  be  taken  when  comparing  the  compression 
values  listed  in  this  paper  with  those  obtained  by  other  experi¬ 
menters.  Even  different  scanners  operating  at  the  same  resolu¬ 
tion  and  scanning  the  same  documents  can  give  significantly 
different  statistics  and  compression  factors.  Far  example, 
another  version  of  the  8  Cun  documents  produced  com¬ 
pression  factors  which  were  on  avenge  13  percent  lower  than 
the  values  quoted  in  this  paper. 

B.  Analysis  and  Measurement  of  the  Error  Effects  on  a  Scan 
Line  of  Coded  Data 

When  s  coded  scan  line  of  facsimile  data  is  disturbed  by  s 
transmission  error,  s  number  of  related  effects  are  produced. 
At  least  one  of  the  codewords  will  be  corrupted  and  incor¬ 
rectly  decoded.  Also,  the  length  of  the  codeword  recognised 
by  the  decoder  may  be  different  from  that  generated  by  the 
coder  so  that  the  coder  and  decoder  will  become  out  of  step. 
Decoding  will  continue  since  the  codes  are  exhaustive  (ignor¬ 
ing  for  a  moment  the  redundancy  associated  with  the  EOL 
codewords)  but  codewords  recognised  by  the  decoder  may 
not  be  the  same  as  those  formed  by  the  coder.  Eventually 
resynchronizstion  will  occur,  this  being  s  property  of  most 
comma-free  codes  [18],  [19),  after  which  the  data  will  be 
correctly  decoded.  If  resynchromzation  does  not  occur 
naturally  along  the  coded  tins  then  the  specially  constructed 
EOL  codeword  forces  ^synchronisation  at  the  end  of  the 
coded  line. 

Fig.  4  shows  an  exam  pis  of  the  effect  of  a  single  error  on 
part  of  s  typical  coded  data  stream.  The  raynchronization 
period  is  defined  as  the  number  of  coded  bits  between  the 
beginning  of  the  codeword  corrupted  by  an  error  and  the  end 
of  the  codeword  on  which  word  resynchronization  taka* 
place.  (MUC  or  terminating  codeword!  associated  with  the 
first  and  st  codeword  in  the  resynefaroaization  period  are 
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Fig.  4.  Effect  of  an  error  on  coded  data. 
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Fig.  S.  Effect  of  an  error  on  a  seen  line. 


included.)  Occasionally,  the  error  does  not  cause  loss  of 
synchronization  as  the  length  of  the  corrupted  codeword 
decoded  by  the  decoder  is  the  same  as  that  formed  by  the 
coder.  In  this  case,  the  resynchronization  period  is  equal  to 
the  length  of  the  corrupted  codeword. 

Fig.  5  shows  the  corresponding  effect  of  the  same  error  on 
the  actual  decoded  scan  line.  A  sequence  of  correct  (“lost”) 
run  lengths  is  replaced  by  a  sequence  of  incorrect  (“false”) 
run  lengths.  If  the  correct  run  lengths  generated  after  re* 
synchronization  could  be  replaced  in  their  true  positions, 
the  actual  number  of  damaged  pels  would  be  reduced  to  the 
sequence  labelled  “lost”  pels.  The  amount  by  which  these 
correct  run  lengths  are  shifted  to  the  right  or  left  of  their 
true  positions  is  called  the  displacement. 

The  resynchronization  periods  and  displacements  of  the 
modified  Huffman  code,  as  described  above,  cannot  be  easily 
calculated  and  therefore  were  measured  experimentally  using 
computer  simulation  techniques.  Single  errors  were  randomly 
inserted  in  the  coded  data  obtained  from  a  document  and  a 
computer  program  was  used  to  decode  simultaneously  both  a 
correct  version  of  the  coded  data  stream  and  a  version  con¬ 
taining  the  single  error.  When  a  corrupted  codeword  was 
detected,  its  position  was  recorded  and  decoding  was  con¬ 
tinued  until  resynchronization  in  terms  of  both  “color,” 
Le.,  black  and  white,  and  codeword  length  took  place  between 
the  2  data  streams.  About  2000  errors  were  inserted  into  the 
data  obtained  Grom  each  coded  document  in  such  a  way  that 
once  one  error  bad  been  inserted,  no  further  error  was  inserted 
until  resynchronization  had  occurred. 

Table  VI  summarizes  the  results  obtained  for  4  documents 
scanned  at  high  resolution.  The  median  values  show  that  for 
half  the  errors,  the  number  of  lost  runs  and  lost  pels  is  less 
than  50  and  20  pets,  respectively.  These  are  equivalent  to 
disturbances  of  6.2  and  2.5  mm  along  a  scan  line,  respectively. 
Those  results  show  that  in  most  cases  the  modified  Huffman 
code  recovers  very  quickly  from  disturbances  and  indicate  that 
one  of  the  correlation  processes  described  in  Section  V-D 
could  be  used  to  reduce  substantially  the  damage  caused  by  a 
large  proportion  of  such  errors. 

C.  End-o f- Lint  and  F(D  Bit  Errors 

Three  effects  or  events  are  considered. 

1)  Lott  EOL:  This  occurs  when  an  error  corrupts  the  EOL 
codeword  in  such  a  way  that  it  cannot  be  recognized.  Using 
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any  of  the  error  concealment  techniques  described  in  Section 
V-E,  two  lines  adjacent  to  the  EOL  will  be  replaced  by  a  single 
line. 

2)  Premature  EOL:  This  occurs  when  an  error  occurs  in 
the  fill  bits.  It  creates  a  spurious  or  false  EOL  and  an  extra 
line  may  be  inserted  in  the  document.  Premature  EOL’s  may 
be  recognized  since  they  produce  coded  lines  with  fewer 
coded  bits  than  specified  by  the  MSLT. 

3)  False  EOL:  An  error  corrupts  a  coded  scan  line  in  such 
a  way  as  to  create  a  spurious  EOL.  An  extra  line  will  be  added 
to  the  document. 

Statistically,  the  average  number  (A)  of  EOL’s  hit  by  single 
random  errors  during  the  transmission  of  a  document  is  equal 
to  the  number  of  errors  occurring  on  the  document  multiplied 
by  the  proportion  of  coded  bits  assigned  to  EOL  codewords 

A  »(£X  N)X  ^  «  ELM  (6) 

where  E  is  the  error  rate,  N  is  the  total  number  of  coded  bits. 
L  the  length  of  the  EOL  codeword  and  M  the  number  of  scan 
lines.  Thus  A  is  independent  of  the  number  of  bits  required 
to  transmit  the  document  and  depends  only  on  the  error  rate. 
For  example,  if  E  is  1  in  10s  and  M  is  2376  lines,  then  A  is 
equal  to  2.8.  If  bursts  of  errors  are  considered,  each  of  which 
is  spread  over  B  bits,  then  the  above  equation  becomes 
A  “  E{L  +B  -  1)  M.  If,  for  example,  E  is  a  bum  error  rate 
of  1  in  10*  and  B  is  10  bits  then  A  equals  0.5.  On  an  actual 
telephone  line,  the  bum  length  can  vary  widely  and  the  values 
of  A  may  differ  from  those  given  in  this  simple  analysis. 

Computer  simulation  methods  were  used  to  assess  the  fre¬ 
quency  of  occurrence  of  the  three  events  listed  above.  Four 
of  the  CCITT  documents  scanned  at  high  resolution  were 
coded  with  a  MSLT  of  20  ms  and  the  coded  data  was  sub¬ 
jected  to  the  first  of  the  4  real  error  patterns  recorded  by  the 
University  of  Hannover.  The  corrupted  data  was  decoded  by 
a  computer  program  and  a  record  of  the  three  EOL  error 
events  was  kept.  Three  test  runs  were  performed  for  each 
document  In  the  first  run,  the  beginning  of  the  error  pattern 
coincided  with  the  start  of  the  message  (Fig.  3).  In  runs  2 
and  3,  the  start  of  the  coded  data  coincided  respectively  with 
the  1024th  and  2048th  bits  of  the  coded  data.  The  results 
obtained  for  the  3  runs  are  given  in  Table  VIL 

The  real  error  pattern  has  an  average  bit  error  rate  of  7  J  in 
104,  a  bum  error  rate  of  7.1  in  10*  (guard  length  of  100 
bits),  99  percent  of  bunts  have  a  spread  of  between  9  and 
16  bits,  the  average  bum  spread  is  10  bits  and  the  probability 
of  a  block  of  2048  bits  being  in  error  is  0.08.  However,  these 
statistics  disguise  the  fact  that  the  distribution  of  the  errors 
in  this  real  error  pattern  is  very  uneven.  Because  of  this  and 
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because  the  number  of  bits  required  to  code  each  document 
varies,  the  values  in  Table  VII  differ  considerably.  For  ex* 
ample,  the  avenge  bunt  error  rate  for  document  S  is  hisber 
than  that  for  the  other  documents  and  this  produces  a  larse 
number  of  lost  EOL's. 

The  error  rates  quoted  above  for  the  real  error  pattern  ere 
regarded  as  being  rather  high.  For  example,  measurements  by 
Balkovic  et  al.  [201  indicate  that  95  percent  of  connections 
on  the  Bell  Telecommunications  Network  have  lower  error 
rates  and  50  percent  of  connections  have  burst  error  rates  of 
less  than  1  in  10*  for  a  transmission  rate  of  4800  bit/s.  Al¬ 
though  error  rates  far  many  telephone  networks  are  not 
available  (error  rates  are  difficult  to  classify  becauss  of  their 
wide  diversity),  there  is  no  evidence  to  show  that,  in  general, 
they  differ  substantially  born  Balkovic’]  results.  Thus  it  may 
be  concluded  that  the  error  disturbance  on  most  documents 
will  be  small  and  substantially  lower  than  indicated  by  fits 
results  in  Table  VIL 

D.  Error  Coneealmtnt  Techniques 

Unless  the  number  of  decoded  pels  between  2  successive 
EOL  codewords  is  equal  to  1728,  then  it  can  be  assumed  that 
an  error  has  occurred  in  a  transmitted  line.  In  this  case  one  of 
the  following  error  concealment  processes  may  be  adopted: 

1 )  replace  the  damaged  line  by  an  all  white  line; 

2)  repeat  the  previous  Une; 

3)  print  the  damaged  line; 

4)  use  a  Une-to-line  processing  or  correlation  technique  to 
reconstruct  as  much  of  the  line  as  possible. 

The  run  lengths  decoded  after  resynchroaization  are  dis¬ 
placed  bom  their  correct  positions  on  the  scan  line.  If  the 
displacement  is  more  than  about  4  pels,  then  the  picture 
information  on  the  damaged  line  will  become  disassociated 
from  that  on  adjacent  lines.  Since  this  effect  is  very  notice¬ 
able  and  causes  a  disturbing  streak  across  a  page,  it  is  usually 
preferable  to  use  method  1)  or  2)  above  rather  rh.n  to  print 
the  damaged  line.  For  small  displacements,  however,  method 
3)  may  provide  a  simple  means  of  minimising  the  loss  of 
information  due  to  an  error. 

Correlation  methods  4)  take  advantage  of  the  fact  that  the 
recovery  period  is  often  short  and  attempt  to  retain  u  much 
as  possible  of  the  correctly  decoded  data  on  a  damaged  line. 
This  is  achieved  by  attempting  to  locate  the  damaged  run 
lengths.  One  method  [21  ]  is  to  measure  the  correlation  be¬ 
tween  groups  of  peh  on  the  damaged  line  with  corresponding 
groups  on  the  adjacent  lines  above  and  below.  Where  the 


correlation  is  good,  generally  at  the  beginning  and  end  of  the 
damaged  scan  line,  the  scan  line  data  is  used  to  reconstruct 
the  line.  The  part  of  the  scan  line  which  is  assumed  to  be 
damaged  is  then  replaced  by  a  corresponding  part  of  the 
previous  line.  Other  interesting  reconstruction  methods  are 
described  in  the  literature  [3],  [22]. 

VI.  Selection  of  a  Two-Dimensional 
Coding  Scheme 

There  is  considerable  interest  in  two-dimensional  facsimile 
coding  techniques,  particularly  in  Japan,  as  is  illustrated  by 
Yasuda  in  this  issue  [32).  It  is  asserted  that  such  coding 
schemes  speed  up  the  transmission  of  documents,  especially 
when  they  are  scanned  at  high  resolution,  without  significantly 
increasing  system  costs.  Although  two-dimensional  coding 
schemes  are  more  vulnerable  to  transmission  errors  since  the 
effect  of  a  single  disturbance  can  propagate  over  several  lines 
it  is  felt  that  the  increase  in  document  degradation  is  not  large 
enough,  in  general,  to  deter  their  use.  As  a  result,  Japan  [23] 
proposed  that  a  two-dimensional  code  called  the  relative  ele¬ 
ment  address  designate  (READ)  code  should  be  included  aa 
an  option  in  the  T.4  Recommendations  for  Group  3  equip¬ 
ment.  This  code  combines  some  of  the  techniques  used  in  two 
earlier  coding  schemes,  RAC  and  EDIC,  (these  and  the  READ 
code  are  described  by  Yasuda),  and  is  specifically  designed  to 
be  an  extension  of  the  Group  3  one-dimensional  coding 
scheme.  Interest  in  this  code  is  substantial  since  it  has  proved 
to  be  more  efficient  than  RAC,  EDIC,  and  many  other  two- 
dimensional  coding  proposals. 

SGXTV  subsequently  received  further  contributions  concern¬ 
ing  two-dimensional  coding  schemes  which  fall  into  2  cate¬ 
gories.  IBM  Europe  [24],  the  3M  Company  [25],  AT&T 
[26],  and  the  British  Post  Office  [27]  proposed  schemes 
which  are  also  designed  to  be  direct  extensions  of  the  one¬ 
dimensional  code.  The  Federal  Republic  of  Germany  [28] 
and  the  Xerox  Corporation  [29]  proposed  schemes  bssed  on 
predictive  coding  [30] .  SGXTV  agreed  a  procedure  for  mea¬ 
suring  the  compression  efficiencies  and  for  —~""t  the  error 
susceptibilities  of  the  codes  snd  during  1979  the  codes  were 
extensively  tested  by  SGXTV  delegates.  The  performances  of 
the  codes  were  then  examined  at  a  SGXTV  meeting  in  Kyoto, 
Japan,  in  November  1979.  A  comparison  of  the  codes  shows 
that,  from  a  technical  point  of  view,  there  is  little  difference 
between  them  in  terms  of  their  compression  efficiency  and 
error  susceptibility.  The  READ  code  was  strongly  supported 
because  it  had  already  been  built  into  a  large  number  of  com¬ 
mercial  machines.  However,  some  SGXTV  delegates  suggested 
that  a  number  of  modifications  to  the  READ  coding  proce¬ 
dure  would  simplify  its  implementation  without  significantly 
changing  its  compression  efficiency.  The  following  alterations 
were  suggested: 

1)  Vertical  mode  coding  should  be  restricted  so  that  the 
examination  of  the  reference  line  does  not  extend  beyond 
±3  pete.  The  statistics  for  the  coding  elements  obtained  for 
the  READ  code  show  that  horizontal  mode  coding  was  nearly 
always  more  efficient  than  vertical  mode  coding  when  the 
examination  of  reference  line  is  extended  beyond  3  pete.  This 
restriction  simplifies  the  implementation  since  it  is  not  neces¬ 
sary  to  code  every  chsnging  element  by  both  horizontal  snd 
vertical  mode  coding. 

2)  The  necessity  to  add  insertion  bits  (bit  stuffing)  into  the 
coded  data  should  be  avoided.  It  was  generally  agreed  that  the 
use  of  insertion  bits  in  the  READ  code  to  ensure  a  unique  line 
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synchronization  sequence  would  add  extra  implementation 
complexity. 

3)  The  EOL  codeword  should  be  made  the  same  as  that 
used  in  the  one-dimensional  coding  procedure.  This  ensures 
that  the  code  retains  its  resynchronization  properties  and 
avoids  the  need  for  bit  stuffing. 

4)  The  code  should  cater  for  future  extensions.  In  particu¬ 
lar,  a  number  of  delegates  expressed  their  desire  for  the  two- 
dimensional  code  to  provide  an  uncompressed  mode.  Later  it 
may  also  be  desirable  to  include  more  sophisticated  coding 
procedures  such  u  feature  extraction  or  pattern  recognition 
techniques  and  the  coding  of  gray  or  colored  areas. 

After  considerable  discussion  a  suitable  compromise,  called 
the  modified  READ  code,  was  proposed  by  the  Japanese  dele¬ 
gation.  This  proposal  incorporated  many  of  the  features  de¬ 
scribed  above  and  was  readily  supported  by  SGXTV. 

vn.  The  Two-Dimensional  Coding  Scheme 

The  modified  READ  code  is  a  line-by-line  scheme  in  which 
the  position  of  each  changing  element  on  the  coding  line  is 
coded  with  respect  to  either  the  position  of  a  corresponding 
changing  element  on  the  reference  line,  which  lies  immediately 
above  the  coding  line,  or  with  respect  to  the  preceding  chang¬ 
ing  element  on  the  coding  line.  AfteT  the  coding  line  has  been 
coded,  it  becomes  the  reference  line  for  the  next  coding  line. 
In  order  to  prevent  the  vertical  propagation  of  damage  caused 
by  transmission  errors,  no  more  than  K  -  1  successive  lines  are 
two- dimensionally  coded  and  the  next  line  is  one- dimensionally 
coded.  Usually  A  is  set  equal  to  2  at  normal  resolution  and  set 
equal  to  4  at  high  resolution.  Before  describing  the  coding 
procedure,  it  is  necessary  to  define  the  changing  pels  and  the 
3  coding  modes  used  in  the  coding  procedure. 

A.  Definition  of  Changing  Picture  Elements 

Definition:  A  changing  picture  element  is  an  element  whose 
“color”  (black  or  white)  is  different  from  that  of  the  previous 
element  along  the  same  line. 

The  coding  algorithm  makes  use  of  3  changing  elements  situ¬ 
ated  on  the  coding  and  reference  lines.  These  are  defined 
below  with  examples  given  in  Fig.  6. 

«o  ■'  The  reference  or  starting  changing  element  on  the  cod¬ 
ing  line.  Its  position  is  defined  by  the  previous  coding 
mode  ss  described  in  Section  VII-C.  At  the  start  of  the 
coding  line,  a«  is  set  on  an  imaginary  white  changing 
element  situated  just  before  the  first  actual  element  on 
the  coding  line. 

«i :  The  next  changing  element  to  the  right  of  a0  on  the 
coding  line.  This  has  the  opposite  color  to  a0  and  is  the 
next  changing  element  to  be  coded. 

«s :  The  next  changing  element  to  the  right  of  at  on  the 
coding  line. 

ht :  The  next  changing  element  on  the  reference  line  to  the 
right  of  s«  and  having  the  same  color  as  «i . 

b-i  :  The  next  changing  element  on  the  reference  line  to  the 
right  of  hi. 

If  any  of  the  coding  elements  at ,  a%,  bt ,  tj  are  not  detected 
at  any  time  during  the  coding  of  the  line,  then  they  are  set  on 
an  imaginary  element  positioned  just  after  the  last  actual  ele¬ 
ment  on  the  respective  scan  line. 

B.  Definition  of  Coding  Model 

The  coding  procedures  uses  3  coding  modes  which  are  de¬ 
fined  below  and  illustrated  by  the  examples  given  in  Fig.  6. 


•«  •«  *1 
I  ♦*.»»!«—  «.*t  — M 

hsnisMsl  mod* 


(b) 

FIs.  6.  (a)  Pm*  mod*,  (b)  Vertical  *ad  horixeatal  mod**. 

1)  Pan  Mode  Coding:  This  is  identified  when  the  poeition 
of  bt  lies  to  the  left  of  at .  The  purpose  of  the  pess  mode  is  to 
identify  white  or  black  runs  on  the  reference  line  which  are 
not  adjacent  to  corresponding  white  or  black  runs  on  the  cod¬ 
ing  line.  The  pass  mode  is  represented  by  a  single  codeword 
in  the  two-dimensional  code  table  (Table  VK). 

2)  Vertical  Mode  Coding:  When  this  mode  is  identified,  the 
position  of  ax  is  coded  relative  to  the  position  of  hi .  The  rela¬ 
tive  distance  atbi  can  take  on  one  of  seven  values  K(0), 
VR(  1).  Vr(2),  Vr(3),  VL(l),  K£,(2).  and  VL(3)  each  of 
which  is  represented  by  a  separate  codeword.  The  subscripts 
R  and  L  indicate  that  ax  is  to  the  right  or  left,  respectively  of 
hi ,  and  the  number  in  brackets  indicates  the  value  of  the  dis¬ 
tance  athi . 

3)  Horizontal  Mode  Coding:  If  vertical  mode  coding  cannot 
be  used  to  code  the  poeition  of  at ,  then  its  poeition  must  be 
coded  by  horizontal  mode  coding.  That  is,  the  run  lengths 
Coat  and  atat  are  coded  using  the  codewords  H  M(a^tx)  * 
M(atat).  H  is  the  flag  codeword  “001”  taken  from  the  two- 
dimensional  code  table  (Table  VIII)  and  hf(a0a,)  and Mia^t) 
are  codewords  taken  from  the  appropriate  modified  Huffman 
code  tables  to  represent  the  colon  and  values  of  the  run 
lengths a0«i  and  e,a2. 

C.  The  Coding  Procedure 

Having  determined  the  next  set  of  changing  elements  at ,  «* , 
ht ,  and  b3 ,  the  coding  procedure  identifies  the  next  coding 
mode,  selects  the  appropriate  codeword  from  Table  VIII  and 
then  resets  the  reference  element  a0  as  defined  below.  The 
coding  procedure  is  formally  defined  by  the  flow  diagram 
shown  in  Fig.  7  and  basically  consists  of  2  steps. 

a)  Step  l: 

i)  If  hj  is  detected  before  a(  then  a  pass  mode  has  been 
identified  and  the  code  word  ‘*0001”  is  issued.  The  reference 
element  a0  is  set  on  the  element  below  h2  in  preparation  for 
the  next  coding. 

ii)  If  a  pass  mode  is  not  detected,  proceed  to  Step  2. 

b)  Step  2:  Determine  the  number  of  elements  which  sepa¬ 
rate  «|  and  ht. 

i)  If  |ath||  <3  then  code  the  relative  distance  <r(hi  by 
vertical  mode  coding.  Set  a0  on  the  position  of  a  j  in  prepara¬ 
tion  for  the  next  coding. 

ii)  If  |«ihj|  >  3  then  code  the  positions  of  at  and  at  by 
horizontal  mode  coding,  i.e.,  transmit  the  codewords  H  + 
3f(a0*i)  +  M(a,a3).  After  the  coding,  is  regarded  as  the 
new  position  of  the  reference  element  a0. 

It  is  possible  to  vary  the  above  procedure  without  affecting 
the  compatibility  between  coder  and  decoder  but  further 
studies  into  the  use  of  these  variations  are  required.  For  exam¬ 
ple,  it  is  possible  to  restrict  the  use  of  the  pass  mode  to  a  single 
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pass  made  to  prevent  long  sequences  of  pass  modes  which 
might  give  inefficient  coding.  Also,  if  \a,  bt  |  <  3,  each  «i  may 
be  coded  by  both  vertical  and  horizontal  mode  coding  and 
the  most  efficient  coding  mode  chosen  as  in  the  original  READ 
code.  However  preliminary  tests  have  not  indicated  that  these 
particular  variations  lead  to  increased  compression  factors. 

D.  Coding  the  First  and  Last  Elements  on  a  Line 

If  horizontal  mode  coding  is  used  to  code  the  first  element 
on  the  coding  line,  then  the  value  of  a^ax  is  replaced  by 
a«C|  -  1  to  ensure  that  the  correct  run-length  value  is  trans¬ 
mitted.  Therefore,  if  the  first  element  on  a  line  is  black,  then 
the  first  codeword  Af(«o«t)  will  be  that  which  represents  a 
white  run  of  zero  length. 

The  coding  of  the  line  continues  until  the  imaginary  chang¬ 
ing  element  situated  just  after  the  last  actual  element  on  the 
coding  line  has  been  coded.  Thus  exactly  1728  elements  are 
coded  on  each  line  and  the  receiver  can  check  that  each  line 
hu  been  correctly  decoded. 

E.  The  Code  Table 

The  two-dimensional  code  table  is  given  in  Table  VIII  and  is 
also  drawn  in  the  form  of  a  code  tree  in  Fig.  8.  The  code  tree 
is  constructed  so  that  it  contains  the  codeword  “0000000” 
which  is  then  extended  to  form  the  EOL  codeword  11  X 
“0”  ♦  “1”.  The  remaining  codewords  are  then  added  to  the 
code  tree  according  to  the  relative  frequencies  of  the  required 
coding  elements.  These  frequencies  were  obtained  by  com¬ 
puter  simulation  tests  on  the  CCfTT  documents.  Finally  the 
two-dimensional  extension  codeword  (Section  VII-I)  is  as¬ 
signed  to  the  shortest  remaining  codeword.  This  construction 
method  ensures  that  the  same  unique  EOL  codeword  is  used 
whether  a  line  is  coded  by  the  one-  or  twodimensional  coding 
procedure. 
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F.  EOL  codeword.  Tag  Bits,  Fill  Bits,  and  Return  to  Control 
Each  EOL  codeword  is  followed  by  a  single  tag  bit,  a  “1”  or 

a  “0",  which  indicates  that  the  next  line  is  one  or  two  dimen¬ 
sionally  coded,  respectively.  “Fill”  bits  consisting  of  variable 
length  strings  of  “0"’s  are  inserted,  when  required,  at  the  end 
of  a  coded  line  and  before  the  EOL  and  tag  bit.  The  return  to 
control(RTC)  signal  consists  of  6  consecutive  EOL  codewords, 
each  of  which  is  followed  by  a  “1”  tag  bit. 

G.  K  Factor 

As  stated  earlier,  it  is  recommended  that,  after  a  one  dimen¬ 
sionally  coded  line,  not  more  than  K  -  1  successive  lines  are 
two  dimensionally  coded,  where  K  is  equal  to  2  fordocuments 
scanned  at  normal  resolution,  and  4  for  those  scanned  at  high 
resolution.  More  scan  lines  than  suggested  by  the  value  of  K 
can  be  one  dimensionally  coded  without  affecting  compatibil- 


A-12 


PROCEEDINGS  OF  THE  IEEE.  VOL.  6B.  NO.  7,  JULY  19M 


ity  if  this  prove*  useful  in  terms  of  either  compression  or  error 
susceptibility. 

H.  Uncomprttstd  Mod* 

Both  one*  end  two-dimensional  codins  of  some  detailed  doc¬ 
uments  leads  to  localized  data  expansion  where  the  number  of 
bits  exceeds  the  number  of  pels.  Documents  containing 
screened  photographic  images,  or  areas  of  cross  hatching  as  on 
some  business  forms  can  produce  this  effect.  To  cater  for 
these  situations,  an  uncompressed  mode,  proposed  by  IBM 
[31 ) ,  has  been  suggested  as  an  option  to  the  two-dimensional 
coding  scheme.  Entry  to  the  uncompressed  mode  on  a  one- 
and  two- dimensionally  coded  line  is  achieved  by  using  the  one- 
and  two-dimensional  extension  codewords,  respectively ,  given 
in  Table  VIII,  with  the  bits  XXX  set  to  111.  The  other  combi¬ 
nations  of  XXX  are  reserved  for  other  as  yet  unspecified 
extensions.  However,  on  a  one  dimensionally  coded  line,  the 
coder  does  not  enter  uncompressed  mode  following  a  code¬ 
word  ending  in  the  bit  sequence  “000”.  This  prevents  the 
detection  of  false  EOL's  by  the  decoder  caused  by  concatena¬ 
tion  of  this  sequence  with  the  one-dimensional  extension 
codeword. 

The  following  example  of  an  uncompressed  mode  is  given  in 
Recommendation  T.4.  Once  the  uncompressed  mode  has  been 
entered,  the  source  data  itself  is  transmitted,  with  a  “O’*  repre¬ 
senting  a  white  pel  and  a  “1"  representing  a  black  pel.  Each 
group  of  S  successive  “0”’s  must  be  followed  by  an  insertion 
bit  “1  The  insertion  bits  are  discarded  by  the  decoder.  The 
corresponding  code  table  is 
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200655 

1505T2  | 

10X221 

2250T? 

30*069 

16*005 

“C23J3 

“20119 

6516*3 

690*79 

565135 

13-369  | 

210*57 

26*029 

333066 

10167“ 

22TUT  | 

259T0T 

35503* 

•3*909 

295019 

TABLE  X 

FaaquiNcns  or  Coding  Eumbnts,  7.7  Limbs /mm.  K  *  * 


P 

I 

V(C) 

V1* 

Vl(2» 

V3) 

V1' 

v« 

166* 

265* 

16**5 

5053 

9T9 

316 

59*7 

1035 

7730 

10032 

59075 

22003 

“190 

921 

2“  790 

“671 

322* 

“310 

35520 

11396 

1592 

362 

10050 

1306 

0612 

12392 

“9661 

21079 

“301 

1*70 

15059 

2022 

The  uncompressed  mode  ”1”  insertion  process  allows  the 
use  of  the  following  S  exit  codewords: 


Image  Pattern 


Codeword 

0000001T 
0000000 IT 
Q00000001T 
0000000001T 
00000000001T 


The  flag  bit  T  denotes  the  color  of  the  next  run:  black  is  1, 
white  is  0. 

At  the  present  time,  the  uncompressed  mode  is  still  under 
review  by  SGXTV  and  has  not  been  fully  tested.  Procedures 
are  being  considered  which  will  determine  the  optimum  entry 
and  exit  points  for  the  uncompressed  mode. 

VU1.  Performance  of  the  Two-Dimensional 
Coding  Scheme 

Table  IX  lists  the  number  of  coded  bits  for  the  8  CCITT 
documents  using  the  two-dimensional  coding  procedure.  At 
normal  (low)  resolution  with  K  *  2  at  a  transmission  rate  of 
4800  bit/s,  the  average  transmission  times  are  47.3  and  S4.1  s 
with  MSLT  equal  to  0  and  20  ms,  respectively.  At  high  resolu¬ 
tion  with  K  »  4,  the  corresponding  average  times  are  74.0  and 
90.6  s,  respectively.  These  4  average  times  are,  respectively, 
16.4,  10.6,  33.3  and  25.1  percent  lower  than  the  correspond¬ 
ing  average  times  achieved  using  one-dimensional  coding.  At 


TABLE  XI 

Eftbcts  cm  E**o*s  on  EOL  Sequences  to* 
Two-Dimensional  Cooing 


2  !  M  10  1  *1  3 


|  pot ann  mu  I  i  3  8  I  4  0  6  5 


I  UTO  -9  41  -2  U3  m  104  4-  44  a1  in  110  13 


■o  or  mm 
supra 


high  resolution,  K  *  infinity  and  MSLT  set  to  0  ms,  the  aver¬ 
age  transmission  time  is  61. S  s  which  is  44.7  percent  lower 
than  the  transmission  time  for  one-dimensional  coding. 

Table  X  lists  the  frequencies  of  the  coding  elements  obtained 
for  4  of  the  CCITT  documents  scanned  at  high  resolution  with 
K  *4. 

Useful  increases  in  the  compression  factors  can  be  obtained 
by  coding  each  line  by  both  one-  and  two-dimensional  coding 
and  selecting  the  coded  line  with  the  fewest  bits.  This  selec¬ 
tion  is  carried  out  under  the  restriction  that  no  mote  than 
K  -  1  successive  lines  are  two  dimensionally  coded.  For  docu¬ 
ments  1,4,5,  and  7  scanned  at  high  resolution  (MSLT  *  0  ms, 
X*  4),  the  number  of  coded  bits  obtained  were  194622, 


A- 13 


HUNTER  AND  ROBINSON:  INTERNATIONAL  FACSIMILE  CODING  STANDARDS 


<6? 


611132,  333837,  and  S9717S,  respectively.  These  values  are 
on  average  7  percent  tower  than  those  for  the  corresponding 
values  obtained  with  a  fixed  value  of  K. 

Table  XI  lists  the  results  obtained  by  subjecting  the  real  error 
pattern  to  the  coded  data  obtained  for  4  of  the  documents 
scanned  at  high  resolution  (MSLT  *  20  ms,  K  «  4).  Three  test 
runs  were  earned  out  as  described  in  Section  V-C. 

VIII.  Conclusion 

The  one-dimensional  run-length  coding  scheme  and  the  two- 
dimensional  modified  READ  code  which  we  have  described  in 
this  paper  are  expected  to  form  the  basis  of  a  CCTTT  standard 
(Recommendation  T.4)  which  will  allow  digital  facsimile 
equipment  to  interwork  on  national  and  international  general 
switched  telephone  networks. 

The  evaluation  of  the  basic  one-dimensional  code,  particu¬ 
larly  in  respect  to  its  compression  efficiency ,  susceptibility  to 
erron  and  implementation  complexity,  indicates  that  it  pro¬ 
vides  a  good  1  min  facsimile  standard.  The  optional  two- 
dimensional  code  provides  larger  reductions  in  transmission 
times  for  high  resolution  documents  and  is  considered  to  be 
one  of  the  most  efficient  of  the  two-dimensional  codes  pro¬ 
posed  so  far.  The  error  susceptibility  of  the  codes  is  difficult 
to  assess  both  objectively  and  subjectively.  However,  com¬ 
puter  simulation  tests  and  manufacturers’  field  experience 
indicate  that  the  error  disturbance  on  most  documents  trans¬ 
mitted  using  the  one-dimensional  code  is  small.  The  two- 
dimensional  code  is  potentially  more  susceptible  to  erron, 
and  only  time  will  tell  whether  it  will  always  provide  accept¬ 
able  copy  quality  on  national  and  international  telephone 
networks. 

The  formation  of  the  Group  3  standard,  as  in  the  case  of  the 
standardization  of  Group  2  analog  facsimile  equipment,  has 
dramatically  increased  the  degree  compatibility  between  fac¬ 
simile  machines  and  has  opened  up  many  new  applications 
for  this  communication  method.  Although  the  recent  steady 
increase  in  the  installation  of  Group  2  equipment  is  likely  to 
continue,  it  is  expected  that  the  Recommendation  T.4  will 
stimulate  the  growth  of  digital  facsimile  equipment.  Currently 
efforts  are  being  made  to  achieve  compatibility  between  com¬ 
municating  text  handling  terminals  (e.g.,  Teletex)  for  use  on 
either  telephone  or  data  networks.  Future  international  stan¬ 
dards  will  also  aim  for  the  transmission  of  facsimile  over  data 
networks  and  for  compatibility  between  facsimile  and  text 
transmission  systems. 
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I.  Introduction 

OST  facsimile  codins  systems  previously  developed 
have  been  based  on  the  concept  of  run-length  coding 
{ 1 1 .  Run-length  coding  methods  provide  a  relatively 
high  compression  ratio  for  a  graphics  type  of  document  or  an 
alphanumeric  document  containing  a  small  amount  of  text 
[21.  But,  the  achievable  compression  ratio  drops  appreciably 
if  a  document  is  filled  densely  with  alphanumeric  characters 
because  the  black  and  white  run-lengths  become  quite  short. 
Dense  alphanumeric  documents  can  be  efficiently  coded  by 
symbol  recognition  techniques  in  which  individual  symbols  are 
detected  and  coded  by  a  prototype  library  code  [31,  (4). 
However,  such  a  method  cannot  effectively  handle  documents 
containing  a  mixture  of  alphanum erics  and  graphics.  One 
proposed  approach  to  this  problem  has  been  to  segment  a 
document  into  strips  containing  alphanumeric  text  or  graphics 
data,  and  then  code  the  former  by  symbol  matching  and  the 
latter  by  run-length  coding  [5],  The  problems  with  this 
approach  are  the  difficulty  of  document  segmentation  and  the 
drop  in  compression  performance  if  the  segmentation  is  not 
accurate.  This  paper  introduces  a  new  concept  of  hybrid 
symbol-matching/run-iength  coding  in  which  a  document  is 
dynamically  segmented  into  symbol  and  graphics  regions  [61. 

Conceptually,  the  symbol  versus  graphics  segmentation  pro¬ 
cess  employed  in  the  facsimile  compressor  is  quite  simple.  A 
document  is  scanned  line  by  line,  and  ail  isolated  symbols  that 
are  expected  to  recur  in  the  document  are  extracted  and  coded 
by  a  symbol-matching  process.  The  remainder  of  the  docu¬ 
ment,  called  the  residue,  is  coded  by  two-dimensional  run- 
length  coding.  This  segmentation  method  permits  document 
symbols  to  be  coded  by  symbol  matching  without  interference 
from  the  graphics  portions  of  a  document,  and  eliminates 
symbols  from  that  portion  of  the  document  which  is  run- 
length  coded.  The  result  is  an  efficient  match  between  the 
type  of  data  and  the  chosen  coding  methods. 

The  symbol-matching  process  previously  described  has  been 
adapted  to  recognize  alphanumeric  characters  in  a  document. 
In  this  symbol  recognition  mode  of  operation,  the  document 
is  represented  by  conventional  printer  codes:  character,  space, 
carriage  return,  etc. 

The  following  sections  describe  the  combined  symbol  match¬ 
ing  (CSM)  algorithm  for  both  the  facsimile  and  symbol  rec¬ 
ognition  modes  of  operation. 

D.  Facsimile  Coding  Mode 

The  block  diagram  of  Fig.  1  describes  the  basic  elements  of 
the  CSM  coding  system  for  facsimile  coding.  In  operation,  a 
number  of  scan  lines  equal  to  about  two  to  four  times  the 
average  character  height  are  stored  in  a  scrolled  buffer.  This 
data  is  then  examined  line  by  line  to  determine  if  a  black  pixel 
exists.  If  the  entire  line  contains  no  black  pixel,  this  informa¬ 
tion  is  encoded  by  an  end-oHine  code.  If  a  black  pixel  exists, 
a  blocking  process  is  conducted  to  isolate  the  symbol.  For 
those  isolated  symbols,  further  processing  is  required  to 
determine  if  a  replica  of  the  symbol  under  examination  al¬ 
ready  exists  in  the  library.  This  process  involves  the  extrac¬ 
tion  of  a  set  of  features,  a  screening  operation  to  reject  un¬ 
promising  candidates,  and  finally  a  series  of  template  matches. 
The  first  blocked  character  and  its  feature  vector  are  always 
put  into  the  prototype  library,  and  is  each  new  blocked  char¬ 
acter  is  encountered,  it  is  compared  with  each  entry  of  the 
library  that  passes  the  screening  test.  If  the  comparison  is 
successful,  the  library  identification  (ID)  code  is  transmitted 
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along  with  the  location  coordinates  of  the  symboL  If  the 
comparison  is  unsuccessful,  the  new  symbol  is  both  trans¬ 
mitted  and  placed  in  the  library.  Those  areas  of  a  document 
in  which  the  blocker  cannot  isolate  a  valid  symbol  are  assigned 
to  a  residue,  and  a  two-dimensional  run-length  coding  tech¬ 
nique  is  used  to  code  the  residue  data.  The  following  sections 
describe  key  elements  of  the  coder  in  greater  detail. 

A.  Symbol  Blocking 

The  function  of  the  symbol  blocker  is  to  examine  the  input 
buffer  in  a  systematic  fashion,  and  to  locate  the  position  and 
size  of  any  isolated  characters.  Fig.  2  illustrates  the  blocking 
process.  A  black  pixel  in  the  buffer,  denoted  by  the  character 
“1”  is  considered  to  be  a  key  pixel  whenever  the  four  neigh¬ 
bors  located  above  it  and  to  its  left  are  white,  as  shown  below 

000 

01. 

Whenever  a  key  pixel  is  encountered,  the  blocker  is  initiated. 
The  blocker  extracts  those  pixels  from  the  buffer  that  are 
contiguous  with  the  key  pixel,  or  enclosed  by  a  set  of  contig¬ 
uous  black  pixels.  For  example,  with  the  lower  case  letter 
“e,”  all  black  pixels  and  the  enclosed  white  “island”  will  be 
extracted  by  the  blocker. 

B.  Feature  Extraction 

The  most  straightforward  method  to  determine  whether  a 
match  exists  between  an  unknown  symbol  and  one  of  the 
symbols  stored  in  the  library  is  to  perform  a  template  match 
between  the  unknown  and  every  library  symbol.  However,  a 
two-dimensional  template  match  is  costly  in  terms  of  process¬ 
ing  time  and  equipment  A  method  of  reducing  the  number  of 
such  matches  is  required.  The  spproach  that  has  been  taken  is 
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to  extract  *  set  of  scalar  “features”  from  the  various  symbols 
in  the  library.  These  features  are  used  to  reduce  or  “screen” 
the  number  of  candidates  for  a  template  match  to  a  tiny 
fraction  of  all  the  possibilities  in  the  library. 

The  features  used  in  the  screening  process  are  the  block 
height,  block  width,  symbol  perimeter,  and  pixel  area  enclosed 
by  the  outer  boundary  of  the  symbol.  Fig.  3  provides  an 
example  of  features  derived  from  a  character. 

C.  Candidate  Screening 

The  purpose  of  the  screening  process  is  to  reduce  the  burden 
on  the  template  matcher  by  paasing  only  “good  prospects”  to 
the  matcher.  This  is  accomplished  by  calculating  the  feature 
space  distance  between  the  unknown  and  each  library  entry, 
and  selecting  the  library  candidate  with  the  smallest  distance 
as  the  best  prospect  for  a  match.  If  this  match  is  rejected,  the 
next  best  candidate  is  considered,  and  so  forth.  The  distance 
“metric”  used  to  determine  how  “close”  an  unknown  is  to  a 
particular  candidate  is  the  “city  block”  distance  defined  by 

D(U,  O  -  | Fc(I)  -  Fy(/)|  (1) 

/-i 

where  F^iD  is  the  /th  feature  of  the  candidate,  Fgil)  is  the 
/th  feature  of  the  unknown,  |«|  denotes  the  absolute  value, 
Dili,  C)  is  the  distance  between  the  unknown  and  candidate, 
and  Np  is  the  number  of  features. 

D.  Template  Matcher 

The  template  matcher  forms  a  comparison  between  the  binary 
patterns  of  a  detected  symbol  and  a  library  prototype  symboL 
Consider  a  two-dimensional  binary  pattern  represented  by 
A(C,  R)  where  C  *  1,  2,  •  •  • ,  Nc  and  11  •  1, 2,  •  ■  •  ,1V*.  A 
conventional  template  matcher  calculates  the  similarity  be¬ 
tween  a  pair  of  vector  patterns  A(C,  R)  and  B<C,  R)  by  sum¬ 
ming  the  number  of  picture  elements  (pixels)  for  which 
A(C,  R)  and  B(C,R )  differ.  This  EXCLUSIVE  or  error  is  de¬ 
fined  as 

?  ?  A(C,R)9B(C,R)  (2) 

c-i  iTs i 

where  ©  denotes  the  Boolean  EXCLUSIVE  OR  operation. 


A  major  shortcoming  of  the  conventional  template  matcher 
described  above  is  that  it  treats  all  errors  alike  regardless  of 
where  they  occur  spatially.  An  improved  matcher,  to  be 
described,  utilizes  a  “weighted  exclusive  or”  error  criterion 
that  is  based  on  the  context  in  which  the  error  occurs. 

The  motivation  behind  the  weighted  EXCLUSIVE  OR  count 
error  criterion  may  be  appreciated  by  examining  the  EXCLU¬ 
SIVE  OR  error  (denoted  A  ©  B)  in  Figs.  4  and  S.  Compare  the 
EXCLUSIVE  OR  pattern  for  the  “c”  and  “o”  in  Fig.  4  with  the 
pattern  for  the  pair  of  “e’s”  in  Fig.  5.  Note  that  the  exclu¬ 
sive  OR  error  count  for  the  pair  “c”  and  “o”  (count  ■  23)  is 
actually  leu  than  that  for  the  pair  of  “e’s”  (count  “  29)  im¬ 
plying  that,  by  this  error  metric,  “c”  and  “o”  are  “closer” 
than  the  pair  of  “e’s”  are  to  each  other.  However,  the  error 
pattern  for  the  pair  of  “e’s,”  which  should  be  declared  a 
match,  is  composed  of  sparsely  distributed  pixels,  while  the 
error  pattern  for  the  “o”  and  “c”  shows  a  dense  node  of 
error  pixels  corresponding  to  the  missing  right  segment  of  the 
“o.”  One  way  to  quantify  the  density  of  such  a  “node”  is 
to  form  a  summation  in  which  the  “local  density"  of  every 
black  pixel  is  merely  the  sum  of  all  the  pixels  in  its  3  X  3 
neighborhood  if  the  pixel  is  1 ,  and  0  if  the  pixel  is  0.  The 
patterns  above  labeled  “weighted  XOR  error”  have  been 
calculated  in  this  manner.  Note  that  by  this  criterion,  the 
associated  counts  indicate  that  the  pair  “c”  and  “o”  are  more 
separated  (count  *131)  than  are  the  pair  of  “e’s”  (count  »  73). 

In  the  template  matcher,  the  weighted  exclusive  or  error 
is  computed  for  nine  translation  shifts  of  a  pair  of  patterns 
corresponding  to  horizontal  and  vertical  single  pixel  shifts  of 
the  patterns.  The  minimum  error  is  then  compared  to  a 
threshold  in  order  to  determine  whether  or  not  a  match 
should  be  declared.  The  value  of  the  threshold  is  a  non¬ 
linear  function  of  the  symbol's  black  count,  and  is  obtained 
by  an  empirically  determined  look-up  table. 

£  Library  Maintenance 

A  fixed  size  library  is  used  in  the  CSM  system.  The  first 
blocked  character  and  its  feature  vector  occupy  the  fust 
library  slot.  The  subsequent  library  slots  are  occupied  by 
those  blocked  characters  for  which  no  match  is  found.  In 
order  to  prevent  the  library  from  overflowing,  a  scoring 
system  is  employed  to  track  the  usefulness  of  the  library 
elements.  When  the  library  is  (Died,  the  least  used  prototype 
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is  bum  pod  out  of  ths  library  sad  replaced  by  the  now  proto-  Fist.  6  and  7  contain  partial  library  plots  of  isolated  symbols 
type.  At  the  receiver,  the  same  size  library  sad  the  same  from  two  facsimile  documents,  one  a  French  journal  article 
scorins  system  are  utilized  to  ™ «*"«<"  synchronization  with  (CCITT  #4),  sad  the  other  a  Japanese  Ungues*  document 
the  transmitter.  With  a  Ubrary  size  of  N  elements,  the  scorint  (CCITT  #7).  The  first  item  on  th*  list  is  the  first  isolated  pro- 
system  sites  every  “new  prototype”  or  “matched  symbol”  to  type  symbol,  sad  all  foilowias  symbols  represent  matches 
at  least  N  chances  for  a  match,  to  the  prototype. 
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(b)  RESIDUE 

FIs-  a.  MsseUlsd  tecttoe  of  CCITT  no.  4  tad  it>  mUua. 


F.  Prototype  Coding 

After  a  symbol  has  been  blocked,  a  daemon  threshold  it 
applied  to  each  prototype  element  of  the  library  that  hat 
patted  the  screening  test.  If  a  match  it  indicated,  only  the 
matching  library  ID  and  horizontal  location  with  retpect  to 
the  previous  symbol  are  coded.  Otherwise,  the  binary  patten 
of  the  blocked  symbol  it  transmitted  along  with  the  symbol 
width,  symbol  height,  and  horizontal  location,  in  addition  to 
being  placed  in  the  library  at  a  new  prototype  (dement. 

The  simplest  method  of  prototype  coding  it  to  binary  code 
the  pixels  within  a  block  in  a  ratter  scan  fashion.  On  the 
average,  about  30  percent  of  the  prototype  coda  bits  can  be 
eliminated  by  scanning  the  prototype  pixeta  in  a  folded 
“basket  weave”  sequence  and  applying  one-dimensional 
Huffman  coding  of  the  run  lengths.  The  disadvantages  of  this 
approach  are  additional  implementation  complexity  and  pos¬ 
sible  loss  of  bitstream  synchronization  when  a  channel  error 
occur.  The  binary  coding  approach  has  been  adopted  for  a 
high-performance  version  of  the  CSM  facsimile  coder,  and  the 
folded  ran -length  coding  method  is  used  for  a  vary -high- 
performance  version. 

G.  Residua  Coding 

In  many  documents,  there  are  black  pixel  patterns  that  do 
not  meet  the  criteria  of  prototype  characters.  Examples  in¬ 


clude  exceptionally  large  or  exceptionally  small  alphanumeric 
characters,  segments  of  company  logos,  and  segments  of 
handwritten  script.  In  the  CSM  system,  these  patterns  are 
rejected  by  the  symbol  Mocker,  and  then  left  behind  at  a 
residue  to  be  coded  by  two-dimensional  run-length  coding. 
Fig.  8  presents  a  blow  up  of  a  section  of  a  facsimile  docu¬ 
ment  (CCITT  #4)  and  its  corresponding  residue. 

Conceptually,  the  CSM  system  could  employ  any  type  of 
run-length  coding  method  for  residue  coding.  The  selection 
should  be  made  on  the  basis  of  coding  performance,  tolerance 
to  channel  errors,  implementation  complexity,  and  com¬ 
patibility  with  facsimile  standards.  Considering  these  factors, 
a  modified  version  of  the  CCITT  two-dimensional  run-length 
coding  algorithm  has  been  selected  for  the  residue  coder.  By 
inhibiting  the  symbol  matching  process,  the  CSM  coder  will 
automatically  revert  to  a  pure  residue  coder,  which  can  be 
made  exactly  compatible  with  the  CCITT  standard. 

U.  Transmission  Cod t 

The  CSM  facsimile  coding  system  produces  an  asynchronous 
code  that  is  dependent  upon  the  contents  of  the  document  to 
be  coded.  Table  I  contains  a  detailed  specification  of  the  code 
elements  and  Fig.  9  contains  s  stats  diagram  defining  the  code. 
The  code  words  lengths  in  this  specification  have  been  opti¬ 
mized  for  a  scan  resolution  of  8  X  8  pixeis/mm. 
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TABLE  I 

CL1  Comsinid  Symbol  Matching  Facsimile  Com 


/.  Extensions  of  CSM  Concept 
la  «  typical  business  latter  icaaned  at  8  X  8  pixeta/mm, 
about  40  percaat  of  the  compraaaad  code  bits  are  devoted  to 
the  rnnimwainn  of  prototype  symbole.  Almost  all  of  thia 
portion  of  the  tnnamiaaioB  code  caa  be  alba  mated  if  the 
docaaieats  to  be  tranamitted  are  restricted  to  a  fixed  set  of 
symbols,  for  example,  Courier  typewriter  font  la  this  case, 
the  traaaatitter  aad  receiver  libraries  caa  be  prestored  with 


symbols.  Isolated  unknown  symbols  detected  in  the  key 
pixel  scanning  process  that  do  not  match  a  library  entry  caa 
be  placed  in  the  residue  for  subsequent  run  •length  coding. 

The  symbol  matching  process  in  the  CSM  system  is  not 
exact;  a  match  tolerance  is  permitted  between  symbols  to 
accomodate  perturbations  in  symbol  shape  caused  by  the 
scanning  process.  As  a  consequence,  in  the  basic  CSM  system, 
a  reconstructed  document  is  not  an  exact  pixei-by -pixel  replica 
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of  the  original  document.  Although  symbol  substitution 
errors  an  extremely  ran,  then  may  be  applications  in  which 
exact  coding  is  demanded.  This  mode  of  operation  can  be 
accomodated  in  the  CSM  system  by  a  simple  modification  of 
the  coder  and  decoder.  At  the  coder,  after  a  successful  match, 
the  EXCLUSIVE  OR  between  the  pair  of  matched  symbols  is 
formed  and  placed  in  the  residue  for  subsequent  run-length 
coding.  At  the  decoder,  the  pixel  amys  generated  from  re¬ 
constructed  symbols  and  reconstructed  residue  an  combined 
in  an  exclusive  OR  fashion  to  correct  for  differences  in  the 
pair  of  matched  symbols.  In  this  manner,  exact  reproduction 
is  achieved.  However,  the  “overhead”  associated  with  the 
exact  reproduction  mode  of  operation  can  reduce  the  achiev¬ 
able  compression  ratio  by  as  much  as  SO  percent  at  8  X  8 
pixel/mm  resolution. 

III.  Symbol  Recognition  Mode 

The  CSM  algorithm  achieves  facsimile  data  compression  by 
the  matching  of  document  symbols  against  a  library  of  sym¬ 
bol a  accumulated  during  the  document  scan.  If  a  match  oc¬ 
curs,  the  library  index  is  transmitted  rather  than  the  symbol 
binary  pattern.  This  basic  concept  can  be  extended  to  per¬ 
form  symbol  recognition  by  preloading  the  library  with  the 
binary  symbol  patterns  of  a  predetermined  set  of  symbol 
fonts.  The  coder  can  then  operate  in  a  symbol  recognition 
mode  in  which  only  the  ASCII  codes  are  transmitted  and  ail 
other  document  data  such  as  a  signature  or  logo  are  ignored. 

A.  Lin e  Trucking 

In  the  western  world,  printed  matter  is  “read”  from  left  to 
right  and  from  top  to  bottom.  Therefore,  a  symbol  blocking 
system  that  transmits  its  output  to  a  serial  ASCII  terminal 
must  do  the  same.  However,  the  CSM  algorithm  extracts 
characters  from  the  document  being  scanned  in  a  totally  dif¬ 
ferent  fashion.  As  the  line  buffer  scrolls  through  the  page 
from  top  to  bottom,  the  tallest  of  first  encountered  charac¬ 
ters  are  removed  from  the  document  and  processed  through 
the  recognition  algorithm.  Thus  characters  emerge  from  the 
CSM  process  in  a  sequence  which  would  be  totally  incom¬ 
prehensible  if  viewed  in  chronological  sequence.  In  the  con¬ 
ventional  CSM  facsimile  transmission  mode,  this  is  of  no  con¬ 
sequence,  since  characters  are  placed  in  their  appropriate 
address  locations  regardless  of  their  order  of  occurrence.  In 
the  serial  symbol  recognition  mode,  the  transmitter  will  as¬ 
sign  each  character  an  ASCII  code,  assemble  the  codes  into 
lines,  inserting  blanks,  line  feeds,  carriage  returns,  etc.,  and 
transmit  the  lines  serially  to  the  receiver.  For  single  spaced  or 
rotated  documents,  this  “line  tracking”  is  more  difficult  than 
one  would  imagine.  The  problem  is  basically  that  of  grouping 
the  characters  into  lines.  Determining  the  sequence  in  which 
they  should  be  transmitted  is  relatively  easy  since  the  charac¬ 
ters  may  be  sorted  by  their  column  addresses.  A  significant 
benefit  of  this  serial  ASCII  mode  is  that  no  information  on 
character  location  need  be  transmitted,  since  the  correct  se¬ 
quence  is  all  that  is  required  in  order  to  property  reconstruct 
the  received  document. 

The  line-tracking  algorithm  is  based  on  a  straight  line  fit  of 
the  key  pixel  coordinates  of  characters  on  a  text  line,  u  il¬ 
lustrated  in  Fig.  10.  The  straight  line  is  defined  parametrically 

as 

R-J-C  +  O  (3) 


where  R  represents  the  row  index,  C  is  the  column  index,  S 
denotes  the  text  line  slope,  and  O  is  its  offset.  As  characters 
are  encountered,  they  are  assigned  to  the  nearest  straight  line 
representing  a  text  line.  The  algorithm  is  as  follows: 

1 )  The  coordinates  (C,R)  of  the  first  encountered  character 
are  used  as  a  “seed"  to  start  a  cluster  at  S  ■  0, 0  •  R. 

2)  The  (C,  R)  coordinates  of  the  next  character  encountered 
are  used  to  compute  £■«[/?-  S  •  C)  *  for  the  slope  and  offset 
of  each  cluster. 

3)  If  the  error  is  less  than  a  threshold  for  a  given  cluster, 
the  character  is  assigned  to  that  duster  (next  line).  If  it  is 
greater  than  the  threshold  for  all  dusters,  the  oldest  cluster 
is  dumped,  and  a  new  duster  is  started. 

4)  If  the  character  was  added  to  an  existing  duster,  the 
values  of  slope  and  offset  are  updated  by  use  of  minimum- 

mean-square  error  techniques. 

-  / 

B.  Handling  of  Special  Characters 

A  number  of  characters  which  consist  of  two  “subcharacters” 
must  be  treated  as  special  cases  in  the  symbol-recognition 
mode.  This  is  because  the  blocker/matcher  would  otherwise 
fragment  them  into  their  constituent  parts  and  give  misleading 
results.  These  characters  are:  (i),  (j),  (!),  (?),  (•),  (3,  (■),  and 
(').  After  recognition  of  the  two  parts  of  the  character,  the 
system  will  check  if  two  compatible  symbols  are  on  top  or 
almost  on  top  of  each  other.  If  so,  the  two  symbols  are 
merged  into  one.  For  example  two  (-)’s  on  top  of  each  other 
will  be  merged  into  a  (:). 

IV.  Compression  Ratio  Evaluation 

The  CSM  system  has  been  extensively  evaluated  by  com¬ 
puter  simulation  to  optimize  its  performance  and  to  determine 
its  compression  ratio  with  respect  to  other  coding  methods. 

A.  Facsimile  Mode  Evaluation 

The  CCTTT  document  set  of  eight  digitized  documents  of 
200  X  200  line/in  (8X8  pixels/mm)  resolution,  shown  in 
Fig.  1 1,  has  been  used  for  evaluation  of  the  CSM  system  in  its 
facsimile  mode  of  operation.  Tables  II  and  III  contain  listings 
of  the  compression  ratios  for  each  of  the  documents  for  the 
high-performance  and  very-high-performance  versions  of  the 
CSM  algorithm,  respectively.  These  tables  also  contain  the  bit 
allocations  for  each  of  the  code  elements  defined  in  Table  I. 

Table  IV  presents  a  summary  comparison  of  the  compression 
ratios  of  the  high-performance  and  very -high-performance 
CSM  systems  with  several  other  facsimile  coding  methods. 
The  modified  Huffman  code  is  the  CCITT  adopted  standard 
for  one-dimensional  run-length  coding  [21.  The  IBM  code 
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[7],  READ  code  (8 ] ,  and  BPO  code  [9]  are  proposals  for  a 
CCITT  standard  employing  two-dimensional  run-length  cod¬ 
ing.  These  algorithms  all  provide  for  an  end-of-Iine  code.  All 
of  the  algorithms  in  Table  IV  have  been  simulated  and  evalu¬ 
ated  on  the  same  set  of  digitized  documents  scanned  at  the 
University  of  Hannover,  Germany.  The  K  factor  indicates  the'1 
number  of  lines  in  which  the  coder  is  operated  in  its  two- 
dimensional  mode  before  it  reverts  to  a  one-dimensional  mode 
to  limit  the  propagation  of  errors.  y 

Comparison  of  the  compression  performance  of  these  algo¬ 
rithms  indicates  that  the  CSM  methods  outperform  the  run- 
length  coding  techniques  substantially  for  text-predominate 
documents,  and  perform  at  about  the  same  level  as  the  best 
of  the  two-dimensional  run-length  coding  methods  for  graphics- 
predominate  documents. 

B.  Symbol  Recognition  Mode  Evaluation 

The  symbol  recognition  mode  system  has  been  tested  with 
86  sets  of  data,  each  containing  1000  samples  of  one  of  the 
86  symbols  of  the  Courier  10  font.  In  these  tests,  no  mis¬ 
matches  occurred,  and  only  very  badly  damaged  characters 
were  rejected. 

Fig.  12  contains  an  example  of  a  business  letter  and  its  re¬ 
construction  with  the  symbol  matching  coding  mode  of  opera¬ 
tion.  It  should  be  noted  that  the  reconstructed  letter  has  been 
printed  with  a  different  font  than  the  original,  however,  the 
format  and  spacing  of  the  two  letters  are  in  basic  agreement. 


The  compression  factor  obtained  for  this  document  for  opera¬ 
tion  of  the  CSM  system  in  the  symbol  matching  mode  is  about 
2S7 : 1  and  for  operation  in  the  facsimile  mode  is  about  49:1. 

V.  System  Implementation 

Although  the  CSM  system  is  more  complex  to  implement 
thaii  a  conventional  two-dimensional  run-length  coding  sys¬ 
tem,  with  the  advent  of  high-speed  and  relatively  inexpensive 
memory,  discrete  logic  circuits,  and  microprocessors,  imple¬ 
mentation  complexity  has  ceased  to  be  a  deterrent  to  the 
development  o'  high-performance  systems.  A  100  X  100 
Unes/in  (4X4  pixel/mm)  facsimile  coder  using  the  CSM 
algorithm  was  introduced  by  Compression  Labs,  Inc.  of 
Cupertino,  CA,  in  Fall  1978.  This  unit  utilizes  a  microproces¬ 
sor  to  implement  the  algorithm  for  transmission  at  sub¬ 
minute  page  rates.  A  discrete  logic  implementation  of  the 
CSM  algorithm  is  being  developed  by  Compression  Labs  for 
transmission  rates  of  less  than  5  s  for  a  200  X  200  lines/in  page. 

VI.  Summary 

A  new  high-performance  method  of  facsimile  data  compres¬ 
sion,  called  CSM,  has  been  introduced.  The  coding  system 
involves  segmentation  of  a  document  into  symbols,  that  are 
coded  by  template  matching,  and  into  a  residue  of  the  re¬ 
mainder  of  the  document,  that  is  coded  by  two-dimensional 
run-length  coding.  Computer  evaluation  indicates  that  the 
compression  factor  for  text-predominate  documents  is  about 
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TABLE  II 

HlCK-PnPOBMANCt  Coon  SuMMAtY 


OOCT—f 

I 

2 

3 

4 

9 

6 

7 

9 

srair 

<99 

397 

2997 

1613 

1700 

2299 

6936 

1792 

LZMSYV 

603 

603 

603 

603 

603 

603 

603 

603 

SYHTtfi 

2121 

2129 

2129 

2129 

2129 

2129 

2129 

2129 

CQLADO 

269? 

399 

4169 

•767 

4999 

1990 

11176 

1594 

MATFLfi 

991 

37 

1291 

4019 

1754 

330 

2522 

156 

LX1IP 

SMS 

169 

7974 

26292 

10773 

1293 

9970 

434 

EOWOO 

1710 

49 

2164 

7912 

3079 

359 

2920 

124 

BLX6IZ 

1330 

130 

2090 

2990 

2190 

1510 

11120 

940 

BLXCOD 

24691 

3404 

47036 

99360 

44799 

30534 

406471 

29115 

DKLCOL 

•227 

233 

9341 

34177 

14055 

2431 

25692 

1024 

mcoo 

11994 

91323 

93990 

20792 

42014 

93947 

19949 

191353 

TOTAL 

67492 

99946 

132933 

163909 

127939 

137273 

499093 

219253 

COW. 

RATIO 

94.9 

37.2 

27.7 

22.4 

29.7 

26.9 

7.4 

16.9 

TABLE  III 

VnY*HlOH-Pni>0*MANCE  Coon  Summary 


l - 

POCUHPtT 

1 

2 

3 

4 

5 

6 

7 

• 

STTBZT 

439 

266 

1431 

1192 

1152 

I960 

3979 

936 

LIMY* 

nr 

603 

603 

603 

603 

603 

603 

603 

603 

«THTLQ 

2129 

2129 

2129 

2129 

2120 

2129 

2129 

2129 

COLAflO 

2497 

395 

4169 

•767 

4995 

1990 

11176 

1594 

HATOC 

999 

37 

1291 

4015 

1754 

330 

2522 

156 

LI1IP 

9999 

169 

7574 

26292 

10773 

1253 

9970 

434 

RWOI 

1710 

49 

2164 

7512 

3079 

359 

2920 

124 

9LXSXZ 

1330 

130 

2090 

2990 

2150 

1510 

11120 

940 

IYjFCTP 

16941 

2099 

30929 

37399 

29513 

20937 

322790 

19239 

oncoL 

9227 

233 

9341 

34177 

14095 

2431 

25692 

1024 

jyggg 

19594 

91323 

53990 

20732 

42014 

93947 

19945 

191353 

TOTAL 

59342 

99946 

119600 

143417 

111119 

123317 

412539 

207921 

RATIO 

62.0 

37.9 

31.9 

29.3 

22.1 

29.0 

9.9 

17.7 
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TABLE  IV 

Commission  Ratios  ro *  Coding  op  CCLTT  Document  Sit  with  Vaiious  Codino  Aloouthms 


CC2TT 

oocoMDrr 

CCXTT 

1-D 

MAD 

KM 

MAD 

K-12 

XBN 

X«4 

XBM 

8-32 

AM* 

K-4 

8VO 

8-32 

CSM 

B.P. 

8-32 

CSM 

V.U.P. 

8-12 

1 

19.2 

21.1 

24.9 

20.  S 

23.2 

20. S 

23.2 

94.9 

S2.0 

2 

19.1 

22.7 

32.0 

24.1 

29.1 

29.7 

32. S 

37.2 

37.2 

3 

•  a  7 

13. • 

IS. 3 

13.2 

19. S 

13.3 

19.2 

27.7 

31.8 

4 

9.1 

t.s 

7.1 

S.S 

7.2 

S.l 

7.0 

22.4 

29.3 

s 

1.9 

12.7 

14.7 

12.3 

14.1 

12.4 

14.3 

22.7 

33.1 

I 

10.2 

19.0 

29.1 

17. ( 

22.9 

12.1 

23.9 

2S.2 

29.0 

7 

4.1 

(.1 

S.  7 

S.l 

S.S 

S.l 

S.S 

7.4 

2.9 

• 

7.9 

19.1 

20.2 

12.9 

19.1 

14.0 

12.2 

IS. 2 

17.7 

COMPRESSION  LABS,  INC. 


JMV»t  IS.  UTS 


TtiiMMlatl—  Mt.tt.r 
IMtnulaU  CMfi«r 
mi  ituAwf 

IMV  Tsik.  >.1.  1NU 

Oa*r  Nr.  M»iWI 

m»  iMUt  will  arc  u  tka  itiMirt  far  mmwImUm 
at  tka  tulkN  MMWI1  ntlrt  tta.»fklt  far  tka  FAk- 
CONP,  ttaalaUa  Mu  aamtmaaamt.  tka  Titan  4iak  at  tka 
FAS -COM  will  ka  tkla  ta  atom  at  laaat  alaa  aaplat  at  tkit 
Hf*  prior  ta  awatflaalar  wklak  will  faanataa  a  traaauloalaa 
tlaa  af  laaa  tkaa  if  itaaaia  tat  tka  papa.  Tkla  traaaalaaiaa 
tlaa  will  ka  aaklaaakla  aalaa  a  I  too  kaap  U,lul  aaOaa  Car 
lira  aa ratal na. 


Qaapraaaiaa  ratlaa  at  tram  111  op  ta  Uil  aaa  ka 
aapaataa  tram  atkar  papaa  at  laTanatlaa.  liapaaitM  tpaa  tk. 
aataal  aaataat  at  tka  papaa.  Tkaaa  taaprtttlat  tatiaa  in 
Oaflaoa  wkaa  aalaa  tka  SS  lira  par  look  - t  raaalatlaa 

pair. 

Tory  truly  paara. 

C^C/iL 

clot*  «•  immtm 


Oils* 


(8)  ORIGINAL 


flC.13.  iMptoordoNMU 


AMfBlt  IS.  1970 


TalMMWliMlBM  NMA|*r 
Xataraatlaaal  CMpuy 
Ull  traadaay 
Um*  fart#  M.I.  10822 

Dear  nr.  Nasayari 

t»| a  litur  Mill  we  •«  tM  luniari  Cor  iacarslMiita 
af  tha  aialsus  easpraastas  ratlaa  aaeapufela  far  tka  FM- 
conf,  (atstolU  iau  eaapraaaar.  iSa  flatty  4  tat  af  tM 
PAS-COW  will  M  Ml*  ta  atari  ai  laaat  alaa  capias  af  tkla 
Hft  p flat  ta  avarflMlai  WM  will  laaraatM  •  traaaslaaiaa 
tlsa  a(  laaa  Om  21  aaaasda  far  tka  pa***  7*la  traaaalaaiaa 
tlaa  will  ka  MklavaUa  wala*  a  2400  fcaal  4 If  leal  saias  far 
liaa  aiasattl— . 

Caapra—  laa  ratlaa  af  fraa  Sal  «t  ta  2Stl  aaa  ka 
aapwtal  fraa  atkar  pafaa  af  lafaraatiaa.  dapsatlay  upaa  tka 
aataal  aaataat  af  tka  payaa.  Tfcaaa  coapfHaiaii  ratlaa  ara 
daflaat  naaa  uaiaf  tka  tl  Um  par  inch  ta  aaa  lay  raaalatlaa 
asly. 

Vary  truly  yaara. 

7 

Cion  t.  HMtVIM 
via#  araaldaat 
aarkatlaf 

“,v*  (b)  REPRODUCTION 

eoaprotMoa  M  CSM  moda. 
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twice  that  obtained  with  two-dimensional  run-length  coding 
and  about  the  tame  for  gnphics-predominate  documents. 

The  CSM  system  can  be  operated  in  a  pure  symbol  recogni¬ 
tion  mode  in  which  a  document  is  coded  by  recognition  of  its 
alphanumeric  symbols.  Compression  ratios  greater  than  250 : 1 
can  be  achieved  on  business  letters  in  this  mode  of  operation. 
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Appendix  C 


Proposal  for  Mixed  Character  Coded  Text  and  Facsimile 


Areas  on  the  Same  Page 


■V 

Questions  :  16/VIZI 

10,11/XZV 


TITLE  :  PROPOSAL  TO*  MIXED  CHARACTER  CODED  TEXT  AND  FACSIMILE  ASEAS  ON  THE 
SAME  PACE. 

SOURCE  :  MALE  S.A.  ^ 

a* t _  C *~**-A**~g  J 


1 .  Su— sry  * 

The  sin  of  this  contribution  is  to  propose  s  method  for  : 

a)  the  insertion  of  any  number  of  rectangular  facsimile  areas,  positioned 
anywhere  throughout  the  page,  within  Teletex  coded  text; 

b)  the  transmission  of  such  mixed-mode  pages  in  s  manner  that  allows  printing 
by  relatively  simple  receivers. 

2.  General 

There  is  an  increasing  desire  for  the  transmission  of  pages  presenting 
mixed  text  and  usages,  on  one  hand.  On  the  other  hand,  work  has  been  undertaken 
to  standardize  Group  4  facsimile  apparatus  for  transmissions  over  the  public 
data  networks.  In  this  field  these  is  a  growing  tendency  towards  the  use  of  s 
comma  control  procedure  for  Teletex  and  Facsimile  eonsunications. 

In  view  of  these  considerations,  the  transport  service  and  the  end-co-end 
control  procedures,  already  defined  for  Teletex  in  Recommendations  S.70  and  S.62 
respectively,  seem  to  offer  valuable  and  flexible  tools  eoenon  to  the  three 
services  (Teletex,  Croup  4  Facsimile,  and  Mixed  test  ♦  image),  owing  to  the 
offered  advantages  :  universality  and  error  recovery,  s.o. 
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Froa  this  community  of  transport  service  and  control  procedures,  two 
consequences  oris*  : 

-  Tho  existing  faaily  of  fscsiailo  iquipsuti  (including  Croup  3)  asks  us*  of 
quit*  different  procedures.  To  evoid  asking  the  new  Croup  4  equip— nts  unduly 
costly,  their  direct  cas—miestion  possibilities  with  the  existing  range  of 
faesiail*  equip— nts  beco—  highly  questionable. 

-  Pure  Teletex  and  pure  Group  4  Faesiail*  services  could  perhaps  be  considered  as 
siaplified  liait  cases,  derived  froa  the  more  universal  "Mixed  node"  service. 

Starting  froa  this  philosophy,  the  following  faaily  of  terminals  could 
then  appear  : 

(I)  Mixed  aod*  terminal  (includes  : 


(2)  Teletex  terminal.  , 

(3)  Croup  4  Facsimile  terminal. 

It  is  obvious  that  the  siaple  aachines  (2)  and  (3)  might  be  equipped 
with  additional  options,  to  obtain  intermediate  equipments  with  growing  capabilities. 
As  an  example,  by  adding  to  the  terminal  (3)  a  character  generator  and  some  extra 
processing  power,  on*  could  build  a  Croup  4  terminal  (3')  having  also  the  capability 
to  receive  and  print  character  coded  text. 

The  following  communications  over  the  public  networks  are  then  possible  : 


Mixed  text  ♦  image  facility 
Teletex 

Croup  4  Facsimile 
. . .  other  services  ...  ) . 
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In  order  Co  keep  tho  boufic  of  previous  standardisation  efforts  in  Cho 
Teletex  «ad  Facsimile  field*,  ic  seem*  reasonable  to  adopt  fol loving  assumption*. 


The  encoding  of  Che  text  character a,. for  transmission,  shall  be  in 
accordance  with  Che  TeleCex  repertoire  (Bee.  S.61).  The  encoding  of  the  facsimile 
information  shall  be  in  accordance  vith  the  unidimensional  or  bidiaensional  encoding 
scheaes  (Bee.  T.4). 


The  encoding  of  some  new  commands  needed  by  the  nixed  service  shall  also 
r ana in  coapatible  vith  the  mentioned  encoding  scheaes. 

3.2.  Resolutions  for  facsiaile  and  for  text 

Normally,  terminals  such  as  (1)  and  (3)  (or  3'),  capable  of  receiving  both 
images  andmmeem,  vill  be  equipped  vith  printing  aechanisas  operating  in  accordance 
vith  the  facsiaile  resolutions.  Previous  verb  done  by  others  (see  COM  XZV~N*99) 
and  by  ourselves  has  shorn  chat  Teletex  messages  printed  with  T.4  facsimile  resolu¬ 
tions  are  of  good  quality.  Consequently  : 

-  The  horixoncal  and  vertical  resolutions  for  the  facsiaile  parts  of  a  page  shall 
conform  to  Bee.  T.4. 

-  The  character  pitch  and  the  line  spacing*  in  the  text  parts  shall  conform  to  the 
basic  Teletex  standards  given  in  Bee.  S.61. 

3.3.  Shape  of  Che  image  areas 

For  management  simplicity,  it  is  assuswd  chat  the  image  areas  are 
rectangular-shaped,  vith  edges  parallel  Co  the  paper  edges. 


•  •  / ... 
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'  4.  The  structure  of  *  pace  conta  ining  nixed  information 

The  composition  and  eh*  transmission  of  a  composite  page  needs  a 
quantitative  structuring  of  lea  area,  compatible  with  both  Talstax  standards 
(character  pitch  and  lina  spacing)  and  Facsimile  resolutions.  This  can  be 
obtained  as  follow. 

4.1.  Pace  partitioning 

It  is  proposed  to  cover  the  page  vieh  a  virtual  grid  of  equally  spaced 
vertical  lines  and  equally  spaced  horisoncal  lines,  thus  defining  horisontal  (H) 
and  vertical  (V)  coordinates. 

The  origin  (0,0)  should  be  ideally  situated  at  the  top  left  corner  of  the  paper 
sheet. 


The  values  of  the  H  and  V  units  will  be  determined  in  section  4.3. 

» 

4.2.  Image  positioning 

Any  image  area  shall  be  delimited  by  lines  of  the  grid.  This  positioning 
method  allows  for  the  insertion  of  several  images  on  a  page,  and  does  include  the 
limit  cases  of  : 

“  a  slice  of  facsimile  covering  the  total  width  of  the  page. 

-  a  full  facsimile  page. 

4.3.  Character  posit ioning 

It  is  known  (see  also  COM  XXV-N*99)  that  the  basic  character  pitch  and 
lines  spacings  of  Teletax  do  not  correspond  to  an  integer  number  of  picture 
elements  (horixontally)  and  scan  lines  (vertically)  as  defined  in  See.  T.4  for 
facsimile. 

When  accepting  a  small  approximation  (less  than  2  X)  for  the  Teletex 
parameters .  it  is  however  possible  to  define  a  coamon  grid  as  shown  in  fig.  I, 
where  the  horizontal  and  vertical  units  are  given  by  : 

H  unit  -  20  puli  -  20  x  >  2.49  on  <-  1*97  X  v.r.  to  2*54  wm) 

V  unit  1 4  scon  linos  1 6  •77T*2*08  *  (_1  .83  X  v.r.  to  inch). 
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It  is  Co  be  noted  that  tbs  negative  approximations  ars  in  favour  of 
the  integrity  of  tbs  reproduced  pages. 

With  such  •  grid,  say  intersection  (H,V)  rsprsssncs  tbs  virtual  position 
of  s  test  character,  taking  into  account  all  basic  line  spacings  defined  in  S.61 . 

5.  Transmission  of  the  mixed  infomatioti 

Depending  on  the  objectives  in  view,  the  contents  of  a  composite  page  may 
be  segmented  in  various  ways  for  transmission. 

The  "baseline  by  baseline"  method  proposed  hereunder  allows  reception 
and  printing  of  the  mixed  text  and  image  information  without  the  need  of  a  full  peg* 
storage.  This  is  important  when  considering  thac  the  storage  of  a  facsimile  part 
needs  about  20  times  more  memory  capacity  chan  the  storage  of  a  same  area  of 
Teletex.  (One  A4  Teletax  page  1.5... 3  kocteta,  one  A4  Facsimile  30... 60 
koetats) . 


The  "basal ine  by  baseline"  method 


For  better  understanding,  the  method  will  be  explained  on  basis  of  Fig.  2, 
which  represents  an  example  of  an  Image  associated  with  text  in  the  same  horizontal 
slice  of  the  page. 


Along  each  baseline  of  text,  the  first  transmitted  string  contains  in  this 
case  the  Teletex  coded  characters  of  the  left  part;  the  next  string  contains  the 
facsimile  coded  information  corresponding  to  ail  the  succesive  facsimile  lines 
(scanned  from  top  to  bottom)  of  the  subimage  slice  comprised  between  the  current 
base  line  and  the  .previous  one;  the  last  string  contains  the  Teletex  coded 
characters  of  the  right  part. 

Between  strings  of  different  nature,  adequate  delimiting  commands  must  be  inserted. 
In  the  case  of  Fig.  2  the  strings  are  as  follows  : 


05 


-  *  - 


Lin*  1  <Ttzt  i>  Dj  <EOL/Subimege  •.../EOL>  0^  «Text  a ' /SVS/CX/LF > 
(Tsletsx)  (32  facsimile  lines)  (Telstex) 

Lina  2  :  <Taxt  b>  D,  <EOL/Sub  image  b.../E0L>  Dj  <Taxt  b* /SVS/CX/LF> 

(48  facsimile  linas) 

•  •  •«  . 

Lina  5  ‘Text  a*  Dj  <EOL/Subi*age  e.../E0L>  <Texc  e'/CR/LF> 

(32  facsimile  linas) 

SVS  stands  symbolically  for  Cha  Talatax  parametric  control  function  'Salaet 
vortical  spacing*. 

Dj  is  a  delimiting  command  from  Talatax  to  Facsimile  fields. 

»2  is  s  delimiting  command  from  Facsimile  to  Talatax  fields. 

t 

It  is  seen  that  an  image  is  split  into  several  sub images  which  arc 
transmitted  with  character  strings  i abet we an. 


Ic  is  the  responsibility  of  the  source  system  management  to  inject 
the  delimiting  cnemsnda  and  to  possibly  insert  dunsy  characters  (spaces  e.g.) 
to  maintain  the  correct  alignement  of  the  subimages. 


Definition  of  the  "delimiting  errsnd" 


Two  methods  for  indicating  the  delimiting  points  between  facsimile  and 
text  coded  information  may  be  considered. 


5.2.1.  Delimiting  within  the  user  data  field  of  a  CDUI  ci 


In  this  case  the  delimiting  commands  must  be  compatible  with  Che 
repertoire  of  the  current  context. 

-  Dj  :  "Text  to  image"  commend. 

Xt  nuet  be  chosen  in  accordance  with  the  rules  applicable  to  Teletex  control 
characters.  It  may  be  : 

ESC  X  (X  to  be  defined  in  a  new  subrepertoire) 
or  CSX  ♦  parameters. 
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The  CSX  co— nrl  is  preferred,  since  its  parameter  field  may  indicate, 
apart  of  the  "delimiting"  point,  soew  extra  characteristics  of  the  next 
facsimile  string,  such  as  vertical  resolution  and  number  of  pels  per 
scan  line  in  the  subimage.  The  uee  of  such  possibilities  is  however 
mentioned  as  "for  further  study"  in  Rec.  S.61,  para. 3.  3. 4 

-  Dj  :  "Iaage  to  text"  coaaand. 

It  aust  be  a  string  which  is  never  encountered  in  the  facsimile  encoded 
data  field.  E.g.  for  unidiaensional  coding  : 

• OOlg »<OOEjj>  where  00 1^  aeans  EOL  in  hexadeciael 
or  K.'-EOL* 

The  single  EOL  is  aaintained  as  normally  to  indicate  the  end  of  each 
scan  line. 

5.2.2.  Delimiting  at  document  procedure  level 

Another  method  consists  in  introducing  a  CDUI  coaaand  at  each  delimiting 

point. 
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In  S.62,  Che  possibility  is  left  open  for  adding  parameters  in  the  CDUI 
header.  These  parameters  could  define  the  delimiting  comaand,  along  with  addi¬ 
tional  parametric  characteristics  for  the  next  user  data  field.  In  this  case 
the  need  for  defining  specific  "in  text",  Dj  and  coonands  disappears. 

This  method  however  entails  more  overhead  and  thus  less  efficient 
communications. 

5.3.  Flow  control 

In  basic  Teletex,  the  error  recovery  mechanism  is  based  on  a  RU  (recovery 
unit)  of  one  page;  in  facsimile  and  mixed  mode,  this  implies  a  large  aeaory 
capacity  for  storing  at  least  one  full  page  in  the  receiver. 


•  •  / ... 
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la  tec. 62  «  C,  aa  optional  error  recovery  aochaaisa  ia  offered, 

allowing  for  10  of  eaallar  sise  by  iacroduciat  several  recovery  points  within  the 

pass. 


This  last  procedure  eould  be  a  basis  to  iapleaent  also  flow  control 
in  order  to  natch  the  rate  of  information  transfer  to  the  capability  of  the 
receiver,  without  the  need  for  insert  ins  fill  bits  at  the  end  of  facsimile  scan 
lines  (aethod  of  Kec.  T.4). 

6.  Conclusions 

The  proposed  aethod  for  transaission  of  a  page  coaposed  of  nixed 
character  coded  areas  and  faesiaile  coded  areas  presents  the  following  advantages: 

-  no  need  for  a  large  storage  capacity  in  the  receiver 

-  high  flexibility  .in  the  nuad>er  and  dimensions  of  iaages 

-  printing  possible  in  terminals  equipped  with  standard  facsimile  printing 
heads  (T.4  resolutions) 

-  no  need  for  the  transaission  of  coordinates  for  iaages  and  text 

-  full  compatibility  with  basic  Telstex  terminals 

-  possibility  of  changing  the  vertical  facsimile  resolutions  for  each  iaage 
(and  even  for  each  subiaage) . 
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